首页 > 最新文献

Proceedings. International Conference on Intelligent Systems for Molecular Biology最新文献

英文 中文
Quantitative, scalable discrete-event simulation of metabolic pathways. 定量的,可扩展的离散事件模拟代谢途径。
P A Meric, M J Wise

DMSS (Discrete Metabolic Simulation System) is a framework for modelling and simulating metabolic pathways. Quantitative simulation of metabolic pathways is achieved using discrete-event techniques. The approach differs from most quantitative simulators of metabolism which employ either time-differentiated functions or mathematical modelling techniques. Instead, models are constructed from biochemical data and biological knowledge, with accessibility and relevance to biologists serving as key features of the system.

离散代谢模拟系统(DMSS)是一个建模和模拟代谢途径的框架。利用离散事件技术实现了代谢途径的定量模拟。该方法不同于大多数代谢定量模拟器,后者要么采用微分函数,要么采用数学建模技术。相反,模型是由生化数据和生物学知识构建的,具有生物学家的可访问性和相关性,是系统的关键特征。
{"title":"Quantitative, scalable discrete-event simulation of metabolic pathways.","authors":"P A Meric,&nbsp;M J Wise","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>DMSS (Discrete Metabolic Simulation System) is a framework for modelling and simulating metabolic pathways. Quantitative simulation of metabolic pathways is achieved using discrete-event techniques. The approach differs from most quantitative simulators of metabolism which employ either time-differentiated functions or mathematical modelling techniques. Instead, models are constructed from biochemical data and biological knowledge, with accessibility and relevance to biologists serving as key features of the system.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21634115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rapid assessment of extremal statistics for gapped local alignment. 缺口局部对准极值统计量的快速评估。
R Olsen, R Bundschuh, T Hwa

The statistical significance of gapped local alignments is characterized by analyzing the extremal statistics of the scores obtained from the alignment of random amino acid sequences. By identifying a complete set of linked clusters, "islands," we devise a method which accurately predicts the extremal score statistics by using only one to a few pairwise alignments. The success of our method relies crucially on the link between the statistics of island scores and extremal score statistics. This link is motivated by heuristic arguments, and firmly established by extensive numerical simulations for a variety of scoring parameter settings and sequence lengths. Our approach is several orders of magnitude faster than the widely used shuffling method, since island counting is trivially incorporated into the basic Smith-Waterman alignment algorithm with minimal computational cost, and all islands are counted in a single alignment. The availability of a rapid and accurate significance estimation method gives one the flexibility to fine tune scoring parameters to detect weakly homologous sequences and obtain optimal alignment fidelity.

通过分析随机氨基酸序列比对得分的极值统计量来表征间隙局部比对的统计显著性。通过识别一组完整的连接簇,“岛屿”,我们设计了一种方法,该方法仅使用一个到几个成对排列来准确预测极端分数统计。我们方法的成功关键依赖于孤岛分数统计和极值分数统计之间的联系。这种联系是由启发式论证激发的,并通过对各种评分参数设置和序列长度的广泛数值模拟牢固地建立起来。我们的方法比广泛使用的洗选方法快几个数量级,因为岛屿计数以最小的计算成本被简单地合并到基本的Smith-Waterman对齐算法中,并且所有岛屿都在一次对齐中计数。一种快速准确的显著性估计方法的可用性,使人们能够灵活地微调评分参数以检测弱同源序列并获得最佳的比对保真度。
{"title":"Rapid assessment of extremal statistics for gapped local alignment.","authors":"R Olsen,&nbsp;R Bundschuh,&nbsp;T Hwa","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The statistical significance of gapped local alignments is characterized by analyzing the extremal statistics of the scores obtained from the alignment of random amino acid sequences. By identifying a complete set of linked clusters, \"islands,\" we devise a method which accurately predicts the extremal score statistics by using only one to a few pairwise alignments. The success of our method relies crucially on the link between the statistics of island scores and extremal score statistics. This link is motivated by heuristic arguments, and firmly established by extensive numerical simulations for a variety of scoring parameter settings and sequence lengths. Our approach is several orders of magnitude faster than the widely used shuffling method, since island counting is trivially incorporated into the basic Smith-Waterman alignment algorithm with minimal computational cost, and all islands are counted in a single alignment. The availability of a rapid and accurate significance estimation method gives one the flexibility to fine tune scoring parameters to detect weakly homologous sequences and obtain optimal alignment fidelity.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21634118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nearest neighbor classification in 3D protein databases. 三维蛋白质数据库的最近邻分类。
M Ankerst, G Kastenmüller, H P Kriegel, T Seidl

In molecular databases, structural classification is a basic task that can be successfully approached by nearest neighbor methods. The underlying similarity models consider spatial properties such as shape and extension as well as thematic attributes. We introduce 3D shape histograms as an intuitive and powerful approach to model similarity for solid objects such as molecules. Errors of measurement, sampling, and numerical rounding may result in small displacements of atomic coordinates. These effects may be handled by using quadratic form distance functions. An efficient processing of similarity queries based on quadratic forms is supported by a filter-refinement architecture. Experiments on our 3D protein database demonstrate the high classification accuracy of more than 90% and the good performance of the technique.

在分子数据库中,结构分类是一项基本任务,可以通过最近邻方法成功实现。潜在的相似性模型考虑空间属性,如形状和扩展以及主题属性。我们引入3D形状直方图作为一种直观而强大的方法来模拟固体物体(如分子)的相似性。测量误差、抽样误差和数值舍入误差可能导致原子坐标的小位移。这些影响可以通过使用二次形式的距离函数来处理。基于二次型的相似性查询的高效处理由过滤器-细化体系结构支持。在三维蛋白质数据库上进行的实验表明,该方法的分类准确率高达90%以上,具有良好的性能。
{"title":"Nearest neighbor classification in 3D protein databases.","authors":"M Ankerst,&nbsp;G Kastenmüller,&nbsp;H P Kriegel,&nbsp;T Seidl","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In molecular databases, structural classification is a basic task that can be successfully approached by nearest neighbor methods. The underlying similarity models consider spatial properties such as shape and extension as well as thematic attributes. We introduce 3D shape histograms as an intuitive and powerful approach to model similarity for solid objects such as molecules. Errors of measurement, sampling, and numerical rounding may result in small displacements of atomic coordinates. These effects may be handled by using quadratic form distance functions. An efficient processing of similarity queries based on quadratic forms is supported by a filter-refinement architecture. Experiments on our 3D protein database demonstrate the high classification accuracy of more than 90% and the good performance of the technique.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21633637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Constructing biological knowledge bases by extracting information from text sources. 从文本源中提取信息构建生物知识库。
M Craven, J Kumlien

Recently, there has been much effort in making databases for molecular biology more accessible and interoperable. However, information in text form, such as MEDLINE records, remains a greatly underutilized source of biological information. We have begun a research effort aimed at automatically mapping information from text sources into structured representations, such as knowledge bases. Our approach to this task is to use machine-learning methods to induce routines for extracting facts from text. We describe two learning methods that we have applied to this task--a statistical text classification method, and a relational learning method--and our initial experiments in learning such information-extraction routines. We also present an approach to decreasing the cost of learning information-extraction routines by learning from "weakly" labeled training data.

最近,人们在使分子生物学数据库更易于访问和互操作方面做了很多努力。然而,文本形式的信息,如MEDLINE记录,仍然是一个未充分利用的生物信息来源。我们已经开始了一项研究工作,旨在将文本源中的信息自动映射到结构化表示中,例如知识库。我们完成这项任务的方法是使用机器学习方法来归纳从文本中提取事实的例程。我们描述了我们应用于此任务的两种学习方法——统计文本分类方法和关系学习方法——以及我们在学习此类信息提取例程方面的初步实验。我们还提出了一种通过学习“弱”标记训练数据来降低学习信息提取例程成本的方法。
{"title":"Constructing biological knowledge bases by extracting information from text sources.","authors":"M Craven,&nbsp;J Kumlien","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Recently, there has been much effort in making databases for molecular biology more accessible and interoperable. However, information in text form, such as MEDLINE records, remains a greatly underutilized source of biological information. We have begun a research effort aimed at automatically mapping information from text sources into structured representations, such as knowledge bases. Our approach to this task is to use machine-learning methods to induce routines for extracting facts from text. We describe two learning methods that we have applied to this task--a statistical text classification method, and a relational learning method--and our initial experiments in learning such information-extraction routines. We also present an approach to decreasing the cost of learning information-extraction routines by learning from \"weakly\" labeled training data.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21634894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Solving large scale phylogenetic problems using DCM2. 利用DCM2解决大规模系统发育问题。
D H Huson, L Vawter, T J Warnow

In an earlier paper, we described a new method for phylogenetic tree reconstruction called the Disk Covering Method, or DCM. This is a general method which can be used with any existing phylogenetic method in order to improve its performance. We showed analytically and experimentally that when DCM is used in conjunction with polynomial time distance-based methods, it improves the accuracy of the trees reconstructed. In this paper, we discuss a variant on DCM, that we call DCM2. DCM2 is designed to be used with phylogenetic methods whose objective is the solution of NP-hard optimization problems. We show that DCM2 can be used to accelerate searches for Maximum Parsimony trees. We also motivate the need for solutions to NP-hard optimization problems by showing that on some very large and important datasets, the most popular (and presumably best performing) polynomial time distance methods have poor accuracy.

在之前的一篇论文中,我们描述了一种新的系统发育树重建方法,称为磁盘覆盖方法(Disk cover method, DCM)。这是一种通用的方法,可以与任何现有的系统发育方法一起使用,以提高其性能。通过分析和实验表明,DCM与基于多项式时间距离的方法结合使用,可以提高重建树的精度。在本文中,我们讨论了DCM的一个变体,我们称之为DCM2。DCM2被设计用于系统发育方法,其目标是解决NP-hard优化问题。我们证明DCM2可以用来加速对Maximum Parsimony树的搜索。我们还通过展示在一些非常大和重要的数据集上,最流行的(并且可能是性能最好的)多项式时间距离方法具有较差的准确性来激发对NP-hard优化问题解决方案的需求。
{"title":"Solving large scale phylogenetic problems using DCM2.","authors":"D H Huson,&nbsp;L Vawter,&nbsp;T J Warnow","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In an earlier paper, we described a new method for phylogenetic tree reconstruction called the Disk Covering Method, or DCM. This is a general method which can be used with any existing phylogenetic method in order to improve its performance. We showed analytically and experimentally that when DCM is used in conjunction with polynomial time distance-based methods, it improves the accuracy of the trees reconstructed. In this paper, we discuss a variant on DCM, that we call DCM2. DCM2 is designed to be used with phylogenetic methods whose objective is the solution of NP-hard optimization problems. We show that DCM2 can be used to accelerate searches for Maximum Parsimony trees. We also motivate the need for solutions to NP-hard optimization problems by showing that on some very large and important datasets, the most popular (and presumably best performing) polynomial time distance methods have poor accuracy.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21634899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ISMB '99. Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology. Heidelberg, Germany, August 6-10, 1999. ISMB 99年。第七届分子生物学智能系统国际会议论文集。1999年8月6日至10日,德国海德堡。
{"title":"ISMB '99. Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology. Heidelberg, Germany, August 6-10, 1999.","authors":"","doi":"","DOIUrl":"","url":null,"abstract":"","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21656250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genomics via optical mapping. III: Contiging genomic DNA. 通过光学图谱的基因组学。三:基因组DNA序列。
T Anantharaman, B Mishra, D Schwartz

In this paper, we describe our algorithmic approach to constructing an alignment of (contiging) a set of restriction maps created from the images of individual genomic (uncloned) DNA molecules digested by restriction enzymes. Generally, these DNA segments are sized in the range of 1-4 Mb. The goal is to devise contiging algorithms capable of producing high-quality composite maps rapidly and in a scaleable manner. The resulting software is a key component of our physical mapping automation tools and has been used to create complete maps of various microorganisms (E. coli, P. falciparum and D. radiodurans). Experimental results match known sequence data.

在本文中,我们描述了我们的算法方法来构建一组限制性内切酶消化的单个基因组(未克隆)DNA分子图像创建的限制性内切图的比对(conticing)。一般来说,这些DNA片段的大小在1-4 Mb的范围内。目标是设计出能够以可扩展的方式快速生成高质量合成地图的配置算法。由此产生的软件是我们物理制图自动化工具的关键组成部分,并已用于创建各种微生物(大肠杆菌,恶性疟原虫和耐辐射细菌)的完整地图。实验结果与已知序列数据吻合。
{"title":"Genomics via optical mapping. III: Contiging genomic DNA.","authors":"T Anantharaman,&nbsp;B Mishra,&nbsp;D Schwartz","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In this paper, we describe our algorithmic approach to constructing an alignment of (contiging) a set of restriction maps created from the images of individual genomic (uncloned) DNA molecules digested by restriction enzymes. Generally, these DNA segments are sized in the range of 1-4 Mb. The goal is to devise contiging algorithms capable of producing high-quality composite maps rapidly and in a scaleable manner. The resulting software is a key component of our physical mapping automation tools and has been used to create complete maps of various microorganisms (E. coli, P. falciparum and D. radiodurans). Experimental results match known sequence data.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21633635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identity by descent genome segmentation based on single nucleotide polymorphism distributions. 基于单核苷酸多态性分布的血统基因组分割鉴定。
T W Blackwell, E Rouchka, D J States

In the course of our efforts to build extended regions of human genomic sequence by assembling individual BAC sequences, we have encountered several instances where a region of the genome has been sequenced independently using reagents derived from two different individuals. Comparing these sequences allows us to analyze the frequency and distribution of single nucleotide polymorphisms (SNPs) in the human genome. The observed transition/transversion frequencies are consistent with a biological origin for the sequence discrepancies, and this suggests that the data produced by large sequencing centers are accurate enough to be used as the basis for SNP analysis. The observed distribution of single nucleotide polymorphisms in the human genome is not uniform. An apparent duplication in the human genome extending over more than 130 kb between chromosomes 1p34 and 16p13 is reported. Independently derived sequences covering these regions are more than 99.9% identical, indicating that this duplication event must have occurred quite recently. FISH mapping results reported by the relevant laboratories indicate that the human population may be polymorphic for this duplication. We present a population genetic theory for the expected distribution of SNPs and derive an algorithm for probabilistically segmenting genomic sequence into regions that are identical by descent (IBD) between two individuals based on this theory and the observed locations of polymorphisms. Based on these methods and a random mating model for the human population, estimates are made for the mutation rate in the human genome.

在我们努力通过组装个体BAC序列来构建人类基因组序列扩展区域的过程中,我们遇到了几个使用来自两个不同个体的试剂独立测序基因组区域的实例。比较这些序列使我们能够分析人类基因组中单核苷酸多态性(snp)的频率和分布。观察到的转换/翻转频率与序列差异的生物学起源一致,这表明大型测序中心产生的数据足够准确,可以用作SNP分析的基础。观察到的单核苷酸多态性在人类基因组中的分布并不均匀。据报道,人类基因组中1p34和16p13染色体之间存在明显的重复,长度超过130 kb。覆盖这些区域的独立衍生序列超过99.9%是相同的,这表明这种重复事件一定是最近发生的。相关实验室报告的FISH图谱结果表明,人类群体可能具有这种重复的多态性。我们提出了一种预期snp分布的群体遗传理论,并基于该理论和观察到的多态性位置,推导了一种将基因组序列概率分割为两个个体之间相同的区域(IBD)的算法。基于这些方法和人类群体的随机交配模型,对人类基因组的突变率进行了估计。
{"title":"Identity by descent genome segmentation based on single nucleotide polymorphism distributions.","authors":"T W Blackwell,&nbsp;E Rouchka,&nbsp;D J States","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In the course of our efforts to build extended regions of human genomic sequence by assembling individual BAC sequences, we have encountered several instances where a region of the genome has been sequenced independently using reagents derived from two different individuals. Comparing these sequences allows us to analyze the frequency and distribution of single nucleotide polymorphisms (SNPs) in the human genome. The observed transition/transversion frequencies are consistent with a biological origin for the sequence discrepancies, and this suggests that the data produced by large sequencing centers are accurate enough to be used as the basis for SNP analysis. The observed distribution of single nucleotide polymorphisms in the human genome is not uniform. An apparent duplication in the human genome extending over more than 130 kb between chromosomes 1p34 and 16p13 is reported. Independently derived sequences covering these regions are more than 99.9% identical, indicating that this duplication event must have occurred quite recently. FISH mapping results reported by the relevant laboratories indicate that the human population may be polymorphic for this duplication. We present a population genetic theory for the expected distribution of SNPs and derive an algorithm for probabilistically segmenting genomic sequence into regions that are identical by descent (IBD) between two individuals based on this theory and the observed locations of polymorphisms. Based on these methods and a random mating model for the human population, estimates are made for the mutation rate in the human genome.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21633639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using the Fisher kernel method to detect remote protein homologies. 采用Fisher核方法检测远端蛋白同源性。
T Jaakkola, M Diekhans, D Haussler

A new method, called the Fisher kernel method, for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a hidden Markov model. The general approach of combining generative models like HMMs with discriminative methods such as support vector machines may have applications in other areas of biosequence analysis as well.

本文介绍了一种新的检测蛋白同源性的方法,即Fisher核方法,该方法在利用SCOP超家族对蛋白结构域进行分类方面表现良好。该方法是支持向量机的一种变体,使用了一个新的核函数。核函数由隐马尔可夫模型推导而来。将生成模型(如hmm)与判别方法(如支持向量机)相结合的一般方法也可以应用于生物序列分析的其他领域。
{"title":"Using the Fisher kernel method to detect remote protein homologies.","authors":"T Jaakkola,&nbsp;M Diekhans,&nbsp;D Haussler","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>A new method, called the Fisher kernel method, for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a hidden Markov model. The general approach of combining generative models like HMMs with discriminative methods such as support vector machines may have applications in other areas of biosequence analysis as well.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21634111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dataset generator for whole genome shotgun sequencing. 全基因组霰弹枪测序的数据集生成器。
G Myers

Simulated data sets have been found to be useful in developing software systems because (1) they allow one to study the effect of a particular phenomenon in isolation, and (2) one has complete information about the true solution against which to measure the results of the software. In developing a software suite for assembling a whole human genome shotgun data set, we have developed a simulator, celsim, that permits one to describe and stochastically generate a target DNA sequence with a variety of repeat structures, to further generate polymorphic variants if desired, and to generate a shotgun data set that might be sampled from the target sequence(s). We have found the tool invaluable and quite powerful, yet the design is extremely simple, employing a special type of stochastic grammar.

模拟数据集已经被发现在开发软件系统中是有用的,因为(1)它们允许人们孤立地研究特定现象的影响,(2)人们有关于衡量软件结果的真实解决方案的完整信息。在开发用于组装整个人类基因组霰弹枪数据集的软件套件时,我们开发了一个模拟器celsim,它允许人们描述并随机生成具有各种重复结构的目标DNA序列,如果需要,进一步生成多态性变体,并生成可能从目标序列中采样的霰弹枪数据集。我们发现这个工具非常有用,功能非常强大,但它的设计非常简单,使用了一种特殊类型的随机语法。
{"title":"A dataset generator for whole genome shotgun sequencing.","authors":"G Myers","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Simulated data sets have been found to be useful in developing software systems because (1) they allow one to study the effect of a particular phenomenon in isolation, and (2) one has complete information about the true solution against which to measure the results of the software. In developing a software suite for assembling a whole human genome shotgun data set, we have developed a simulator, celsim, that permits one to describe and stochastically generate a target DNA sequence with a variety of repeat structures, to further generate polymorphic variants if desired, and to generate a shotgun data set that might be sampled from the target sequence(s). We have found the tool invaluable and quite powerful, yet the design is extremely simple, employing a special type of stochastic grammar.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21634117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings. International Conference on Intelligent Systems for Molecular Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1