首页 > 最新文献

Proceedings of the ... Asia-Pacific bioinformatics conference最新文献

英文 中文
Exploring Genome Rearrangements using Virtual Hybridization 利用虚拟杂交技术探索基因组重排
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0023
Mahdi Belcaid, Anne Bergeron, A. Chateau, C. Chauve, Yannick Gingras, G. Poisson, M. Vendette
Genomes evolve with both mutations and large scale events, such as inversions, translocations, duplications and losses, that modify the structure of a set of chromosomes. In order to study these types of large-scale events, the first task is to select, in different genomes, sub-sequences that are considered “equivalent”. Many approaches have been used to identify equivalent sequences, either based on biological experiments, gene annotations, or sequence alignments. These techniques suffer from a variety of drawbacks that often result in the impossibility, for independent researchers, to reproduce the datasets used in the studies, or to adapt them to newly sequenced genomes. In this paper, we show that carefully selected small probes can be efficiently used to construct datasets. Once a set of probes is identified ‐ and published ‐, datasets for whole genome comparisons can be produced, and reproduced, with elementary algorithms; decisions about what is considered an occurrence of a probe in a genome can be criticized and reevaluated; and the structure of a newly sequenced genome can be obtained rapidly, without the need of gene annotations or intensive computations.
基因组的进化伴随着突变和大规模的事件,如倒位、易位、复制和丢失,这些事件改变了一组染色体的结构。为了研究这些类型的大规模事件,第一个任务是在不同的基因组中选择被认为是“等效”的子序列。许多方法已被用于识别等效序列,无论是基于生物学实验,基因注释,或序列比对。这些技术存在各种各样的缺陷,往往导致独立的研究人员无法复制研究中使用的数据集,或者使它们适应新测序的基因组。在本文中,我们证明了精心选择的小探针可以有效地用于构建数据集。一旦一组探针被确定并发表,全基因组比较的数据集就可以用基本算法产生和复制;关于什么被认为是在基因组中出现探针的决定可以被批评和重新评估;新测序的基因组结构可以快速得到,不需要基因注释或密集的计算。
{"title":"Exploring Genome Rearrangements using Virtual Hybridization","authors":"Mahdi Belcaid, Anne Bergeron, A. Chateau, C. Chauve, Yannick Gingras, G. Poisson, M. Vendette","doi":"10.1142/9781860947995_0023","DOIUrl":"https://doi.org/10.1142/9781860947995_0023","url":null,"abstract":"Genomes evolve with both mutations and large scale events, such as inversions, translocations, duplications and losses, that modify the structure of a set of chromosomes. In order to study these types of large-scale events, the first task is to select, in different genomes, sub-sequences that are considered “equivalent”. Many approaches have been used to identify equivalent sequences, either based on biological experiments, gene annotations, or sequence alignments. These techniques suffer from a variety of drawbacks that often result in the impossibility, for independent researchers, to reproduce the datasets used in the studies, or to adapt them to newly sequenced genomes. In this paper, we show that carefully selected small probes can be efficiently used to construct datasets. Once a set of probes is identified ‐ and published ‐, datasets for whole genome comparisons can be produced, and reproduced, with elementary algorithms; decisions about what is considered an occurrence of a probe in a genome can be criticized and reevaluated; and the structure of a newly sequenced genome can be obtained rapidly, without the need of gene annotations or intensive computations.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"33 1","pages":"205-214"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76101912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Subtle Motif Discovery for Detection of DNA Regulatory Sites 用于检测DNA调控位点的微妙基序发现
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0006
M. Comin, L. Parida
We address the problem of detecting consensus motifs, that occur with subtle variations, across multiple sequences. These are usually functional domains in DNA sequences such as transcriptional binding factors or other regulatory sites. The problem in its generality has been considered difficult and various benchmark data serve as the litmus test for different computational methods. We present a method centered around unsupervised combinatorial pattern discovery. The parameters are chosen using a careful statistical analysis of consensus motifs. This method works well on the benchmark data and is general enough to be extended to a scenario where the variation in the consensus motif includes indels (along with mutations). We also present some results on detection of transcription binding factors in human DNA sequences. Availability: The system will be made available at www.research.ibm.com/computationalgenomics.
我们解决了检测共识母题的问题,这些母题发生在多个序列的微妙变化中。这些通常是DNA序列中的功能域,如转录结合因子或其他调节位点。这个问题的普遍性一直被认为是困难的,各种基准数据可以作为不同计算方法的试金石。我们提出了一种以无监督组合模式发现为中心的方法。参数是通过对共识母题进行仔细的统计分析来选择的。该方法在基准数据上工作得很好,并且足够通用,可以扩展到共识基元中的变化包括索引(以及突变)的场景。我们还介绍了在人类DNA序列中检测转录结合因子的一些结果。可用性:该系统将在www.research.ibm.com/computationalgenomics上提供。
{"title":"Subtle Motif Discovery for Detection of DNA Regulatory Sites","authors":"M. Comin, L. Parida","doi":"10.1142/9781860947995_0006","DOIUrl":"https://doi.org/10.1142/9781860947995_0006","url":null,"abstract":"We address the problem of detecting consensus motifs, that occur with subtle variations, across multiple sequences. These are usually functional domains in DNA sequences such as transcriptional binding factors or other regulatory sites. The problem in its generality has been considered difficult and various benchmark data serve as the litmus test for different computational methods. We present a method centered around unsupervised combinatorial pattern discovery. The parameters are chosen using a careful statistical analysis of consensus motifs. This method works well on the benchmark data and is general enough to be extended to a scenario where the variation in the consensus motif includes indels (along with mutations). We also present some results on detection of transcription binding factors in human DNA sequences. Availability: The system will be made available at www.research.ibm.com/computationalgenomics.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"44 1","pages":"27-36"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78268989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
etagenome Analysis using Megan 使用Megan进行基因组分析
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0004
D. Huson, Alexander F. Auch, Qi Ji, S. Schuster
In metagenomics, the goal is to analyze the genomic content of a sample of organisms collected from a common habitat. One approach is to apply large-scale random shotgun sequencing techniques to obtain a collection of DNA reads from the sample. This data is then compared against databases of known sequences such as NCBI-nr or NCBI-nt, in an attempt to identify the taxonomical content of the sample. We introduce a new software called MEGAN (Meta Genome ANalyzer) that generates species profiles from such sequencing data by assigning reads to taxa of the NCBI taxonomy using a straight-forward assignment algorithm. The approach is illustrated by application to a number of datasets obtained using both sequencing-by-synthesis and Sanger sequencing technology, including metagenomic data from a mammoth bone, a portion of the Sargasso sea data set, and several complete microbial test genomes used for validation proposes.
在宏基因组学中,目标是分析从共同栖息地收集的生物体样本的基因组内容。一种方法是应用大规模随机霰弹枪测序技术,从样本中获得DNA读数的集合。然后将这些数据与NCBI-nr或NCBI-nt等已知序列的数据库进行比较,试图确定样本的分类内容。本文介绍了一种名为MEGAN (Meta Genome ANalyzer)的新软件,该软件通过使用直接分配算法将这些测序数据分配给NCBI分类的分类群,从而生成物种概况。该方法通过应用于使用合成测序和Sanger测序技术获得的许多数据集来说明,包括来自猛犸象骨骼的宏基因组数据,马尾藻海数据集的一部分,以及用于验证建议的几个完整的微生物测试基因组。
{"title":"etagenome Analysis using Megan","authors":"D. Huson, Alexander F. Auch, Qi Ji, S. Schuster","doi":"10.1142/9781860947995_0004","DOIUrl":"https://doi.org/10.1142/9781860947995_0004","url":null,"abstract":"In metagenomics, the goal is to analyze the genomic content of a sample of organisms collected from a common habitat. One approach is to apply large-scale random shotgun sequencing techniques to obtain a collection of DNA reads from the sample. This data is then compared against databases of known sequences such as NCBI-nr or NCBI-nt, in an attempt to identify the taxonomical content of the sample. We introduce a new software called MEGAN (Meta Genome ANalyzer) that generates species profiles from such sequencing data by assigning reads to taxa of the NCBI taxonomy using a straight-forward assignment algorithm. The approach is illustrated by application to a number of datasets obtained using both sequencing-by-synthesis and Sanger sequencing technology, including metagenomic data from a mammoth bone, a portion of the Sargasso sea data set, and several complete microbial test genomes used for validation proposes.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"73 1","pages":"7-16"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78078645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The Distance Between Randomly Constructed Genomes 随机构建的基因组之间的距离
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0025
W. Xu
In this paper, we study the exact probability distribution of the number of cycles c in the breakpoint graph of two random genomes with n genes or markers and 1 and 2 linear chromosomes, respectively. The genomic distance d between the two genomes is d = n c. In the limit we find that the expectation of d is n 2 1 2 2 1+2 2 1 1 2 ln
本文研究了具有n个基因或标记、1条和2条线性染色体的两个随机基因组的断点图中循环数c的精确概率分布。两个基因组之间的基因组距离d为d = nc,在极限下,我们发现d的期望为n 2 1 2 2 1+2 2 1 1 1 2 ln
{"title":"The Distance Between Randomly Constructed Genomes","authors":"W. Xu","doi":"10.1142/9781860947995_0025","DOIUrl":"https://doi.org/10.1142/9781860947995_0025","url":null,"abstract":"In this paper, we study the exact probability distribution of the number of cycles c in the breakpoint graph of two random genomes with n genes or markers and 1 and 2 linear chromosomes, respectively. The genomic distance d between the two genomes is d = n c. In the limit we find that the expectation of d is n 2 1 2 2 1+2 2 1 1 2 ln","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"20 1","pages":"227-236"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85711737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Genotype-Based Case-Control Analysis, Violation of Hardy-Weinberg Equilibrium, and Phase Diagrams 基于基因型的病例-对照分析,违反Hardy-Weinberg平衡和相图
Pub Date : 2006-11-28 DOI: 10.1142/9781860947995_0021
Y. Suh, Wentian Li
We study in detail a particular statistical method in genetic case-control analysis, labeled “genotypebased association”, in which the two test results from assuming dominant and recessive model are combined in one optimal output. This method differs both from the allele-based association which artificially doubles the sample size, and the direct χ test on 3-by-2 contingency table which may overestimate the degree of freedom. We conclude that the comparative advantage (or disadvantage) of the genotype-based test over the allele-based test mainly depends on two parameters, the allele frequency difference δ and the Hardy-Weinberg disequilibrium coefficient difference δǫ. Six different situations, called “phases”, characterized by the two X test statistics in allele-based and genotypebased test, are well separated in the phase diagram parameterized by δ and δǫ. For two major groups of phases, a single parameter θ = tan(δ/δǫ) is able to achieves an almost perfect phase separation. We also applied the analytic result to several types of disease models. It is shown that for dominant and additive models, genotype-based tests are favored over allele-based tests.
我们详细研究了遗传病例对照分析中的一种特殊的统计方法,即“基于基因型的关联”,该方法将假设显性和隐性模型的两个检验结果结合在一个最优输出中。这种方法既不同于基于等位基因的关联,它人为地使样本量加倍,也不同于3 × 2列联表上的直接χ检验,它可能高估自由度。结果表明,基因型检测相对于等位基因检测的比较优势(或劣势)主要取决于等位基因频率差δ和Hardy-Weinberg不平衡系数差δ δ。在以δ和δ δ为参数的相图中,以等位基因和基因型检测的两个X检验统计量为特征的6种不同的情况被很好地分开,称为“阶段”。对于两大类相,单参数θ = tan(δ/δ δ)就能实现几乎完美的相分离。我们还将分析结果应用于几种类型的疾病模型。结果表明,对于显性和加性模型,基于基因型的测试比基于等位基因的测试更受青睐。
{"title":"Genotype-Based Case-Control Analysis, Violation of Hardy-Weinberg Equilibrium, and Phase Diagrams","authors":"Y. Suh, Wentian Li","doi":"10.1142/9781860947995_0021","DOIUrl":"https://doi.org/10.1142/9781860947995_0021","url":null,"abstract":"We study in detail a particular statistical method in genetic case-control analysis, labeled “genotypebased association”, in which the two test results from assuming dominant and recessive model are combined in one optimal output. This method differs both from the allele-based association which artificially doubles the sample size, and the direct χ test on 3-by-2 contingency table which may overestimate the degree of freedom. We conclude that the comparative advantage (or disadvantage) of the genotype-based test over the allele-based test mainly depends on two parameters, the allele frequency difference δ and the Hardy-Weinberg disequilibrium coefficient difference δǫ. Six different situations, called “phases”, characterized by the two X test statistics in allele-based and genotypebased test, are well separated in the phase diagram parameterized by δ and δǫ. For two major groups of phases, a single parameter θ = tan(δ/δǫ) is able to achieves an almost perfect phase separation. We also applied the analytic result to several types of disease models. It is shown that for dominant and additive models, genotype-based tests are favored over allele-based tests.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"35 1","pages":"185-194"},"PeriodicalIF":0.0,"publicationDate":"2006-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80622850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
RECOMP: A Parsimony-Based Method for Detecting Recombination RECOMP:一种基于简约的复合检测方法
Pub Date : 2005-12-01 DOI: 10.1142/9781860947292_0009
Derek A. Ruths, L. Nakhleh
The central role phylogeny plays in biology and its pervasiveness in comparative genomics studies have led researchers to develop a plethora of methods for its accurate reconstruction. Most phylogeny reconstruction methods, though, assume a single tree underlying a given sequence alignment. While a good first approximation in many cases, a tree may not always model the evolutionary history of a set of organisms. When events such as interspecific recombi nation occur, different regions in the alignment may have different underlying trees. Accurate reconstruction of the evolutionary history of a set of sequences requires recombination detection, followed by separate analyses of the nonrecombining regions. Besides aiding accurate phylogenetic analyses, detecting recombination helps in understanding one of the main mechanisms of bacterial genome diversification. In this paper, we introduce RECOMP, an accurate and fast method for detecting recombination events in a sequence alignment. The method slides a fixed-width window across the alignment and determines the presence of recombination events based on a combination of topology and parsimony score differences in neighboring windows. On several synthetic and biological datasets, our method performs much faster than existing tools with accuracy comparable to the best available method.
系统发育在生物学中的核心作用及其在比较基因组学研究中的普遍性使研究人员开发了大量精确重建系统发育的方法。然而,大多数系统发育重建方法都假设在给定序列比对的基础上有一个单一的树。虽然在许多情况下,树是一个很好的近似,但树并不总是能模拟一组生物的进化史。当发生种间重组等事件时,排列中的不同区域可能具有不同的底层树。准确重建一组序列的进化史需要重组检测,然后对非重组区域进行单独分析。除了有助于准确的系统发育分析外,检测重组还有助于理解细菌基因组多样化的主要机制之一。本文介绍了一种快速准确地检测序列比对中重组事件的方法RECOMP。该方法在对齐中滑动固定宽度的窗口,并根据相邻窗口的拓扑和简约性评分差异的组合确定是否存在重组事件。在一些合成和生物数据集上,我们的方法比现有的工具执行得快得多,精度与现有的最佳方法相当。
{"title":"RECOMP: A Parsimony-Based Method for Detecting Recombination","authors":"Derek A. Ruths, L. Nakhleh","doi":"10.1142/9781860947292_0009","DOIUrl":"https://doi.org/10.1142/9781860947292_0009","url":null,"abstract":"The central role phylogeny plays in biology and its pervasiveness in comparative genomics studies have led researchers to develop a plethora of methods for its accurate reconstruction. Most phylogeny reconstruction methods, though, assume a single tree underlying a given sequence alignment. While a good first approximation in many cases, a tree may not always model the evolutionary history of a set of organisms. When events such as interspecific recombi nation occur, different regions in the alignment may have different underlying trees. Accurate reconstruction of the evolutionary history of a set of sequences requires recombination detection, followed by separate analyses of the nonrecombining regions. Besides aiding accurate phylogenetic analyses, detecting recombination helps in understanding one of the main mechanisms of bacterial genome diversification. In this paper, we introduce RECOMP, an accurate and fast method for detecting recombination events in a sequence alignment. The method slides a fixed-width window across the alignment and determines the presence of recombination events based on a combination of topology and parsimony score differences in neighboring windows. On several synthetic and biological datasets, our method performs much faster than existing tools with accuracy comparable to the best available method.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"24 1","pages":"59-68"},"PeriodicalIF":0.0,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74004680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Knowledge-Based Approach to Protein Local Structure Prediction 基于知识的蛋白质局部结构预测方法
Pub Date : 2005-12-01 DOI: 10.1142/9781860947292_0029
Ching-Tai Chen, Hsin-Nan Lin, K. Wu, Ting-Yi Sung, W. Hsu
Local structure prediction can facilitate ab initio structure prediction, protein threading, and remote homology detection. However, previous approaches to local structure prediction suffer from poor accuracy. In this paper, we propose a knowledge-based prediction method that assigns a measure called the local match rate to each position of an amino acid sequence to estimate the confidence of our approach. To remedy prediction results with low local match rates, we use a neural network prediction method. Then, we have a hybrid prediction method, HYPLOSP (HYbrid method to Protein LOcal Structure Prediction) that combines our knowledge-based method with a neural network method. We test the method on two different structural alphabets and evaluate it by QN, which is similar to Q3 in secondary structure prediction. The experimental results show that our method yields a significant improvement over previous studies.
局部结构预测有助于从头计算结构预测、蛋白质穿线和远程同源性检测。然而,以往的局部结构预测方法存在精度不高的问题。在本文中,我们提出了一种基于知识的预测方法,该方法为氨基酸序列的每个位置分配一个称为局部匹配率的度量来估计我们的方法的置信度。为了弥补局部匹配率低的预测结果,我们使用了神经网络预测方法。然后,我们提出了一种混合预测方法,HYPLOSP (hybrid method to Protein LOcal Structure prediction),它将基于知识的方法与神经网络方法相结合。我们在两种不同的结构字母上测试了该方法,并用QN对其进行了评价,QN与Q3在二级结构预测方面相似。实验结果表明,我们的方法比以往的研究有了明显的改进。
{"title":"A Knowledge-Based Approach to Protein Local Structure Prediction","authors":"Ching-Tai Chen, Hsin-Nan Lin, K. Wu, Ting-Yi Sung, W. Hsu","doi":"10.1142/9781860947292_0029","DOIUrl":"https://doi.org/10.1142/9781860947292_0029","url":null,"abstract":"Local structure prediction can facilitate ab initio structure prediction, protein threading, and remote homology detection. However, previous approaches to local structure prediction suffer from poor accuracy. In this paper, we propose a knowledge-based prediction method that assigns a measure called the local match rate to each position of an amino acid sequence to estimate the confidence of our approach. To remedy prediction results with low local match rates, we use a neural network prediction method. Then, we have a hybrid prediction method, HYPLOSP (HYbrid method to Protein LOcal Structure Prediction) that combines our knowledge-based method with a neural network method. We test the method on two different structural alphabets and evaluate it by QN, which is similar to Q3 in secondary structure prediction. The experimental results show that our method yields a significant improvement over previous studies.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"66 1","pages":"257-266"},"PeriodicalIF":0.0,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87993225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Semi-Supervised Threshold Queries on Pharmacogenomics Time Sequences 药物基因组学时间序列的半监督阈值查询
Pub Date : 2005-12-01 DOI: 10.1142/9781860947292_0034
J. Aßfalg, H. Kriegel, Peer Kröger, Peter Kunath, A. Pryakhin, M. Renz
The analysis of time series data is of capital importance for pharmacogenomics since the experimental evaluations are usually based on observations of time dependent reactions or behaviors of organisms. Thus, data mining in time series databases is an important instrument towards understanding the effects of drugs on individuals. However, the complex nature of time series poses a big challenge for effective and efficient data mining. In this paper, we focus on the detection of temporal dependencies between different time series: we introduce the novel analysis concept of threshold queries and its semi-supervised extension which supports the parameter setting by applying training datasets. Basically, threshold queries report those time series exceeding an user-defined query threshold at certain time frames. For semi-supervised threshold queries the corresponding threshold is automatically adjusted to the characteristics of the data set, the training dataset, respectively. In order to support threshold queries efficiently, we present a new efficient access method which uses the fact that only partial information of the time series is required at query time. In an extensive experimental evaluation we demonstrate the performance of our solution and show that semi-supervised threshold queries applied to gene expression data are very worthwhile. Data mining in time series data is a key step within the study of drugs and their impact on living systems, including the discovery, design, usage, modes of action, and metabolism of chemically defined therapeutics and toxic agents. In particular, the analysis of time series data is of great practical importance for pharmacogenomics. Classical time series analysis is based on techniques for forecasting or for identifying patterns (e.g. trend analysis or seasonality). The similarity between time series, e.g. similar movements of time series, plays a key role for the analysis. In this paper, we introduce a novel but very important similarity query type which we call threshold query. Given a time series database DB, a query time series Q, and a query threshold τ , a threshold query TSQ DB(Q, τ ) returns those time series X ∈ DB having the most similar sequence of time intervals in which the time series values are above τ .I n other words, we assume that each time series X ∈ DB ∪{ Q} is transformed into a sequence of disjoint time intervals covering only those values of X that are (strictly) above the threshold τ . Then, a threshold query returns for a given query object Q that object X ∈ DB having the most similar sequence of time intervals. Let us note that the exact values of the time series are not considered, rather we are only interested in whether the time series is above or below a given threshold τ . In other words, the concept of threshold queries enables us to focus only on the duration of certain events indicated by increased time series amplitudes, while the degree of the corresponding amplitudes are i
时间序列数据的分析对于药物基因组学来说是至关重要的,因为实验评估通常是基于对生物体的时间依赖性反应或行为的观察。因此,时间序列数据库中的数据挖掘是了解药物对个体影响的重要工具。然而,时间序列的复杂性对有效和高效的数据挖掘提出了巨大的挑战。本文主要研究了不同时间序列间时间相关性的检测问题,引入了新的阈值查询分析概念及其半监督扩展,该概念支持通过训练数据集进行参数设置。基本上,阈值查询会报告在特定时间范围内超过用户定义查询阈值的时间序列。对于半监督阈值查询,相应的阈值分别根据数据集、训练数据集的特征自动调整。为了有效地支持阈值查询,我们利用查询时只需要部分时间序列信息的特点,提出了一种新的高效访问方法。在广泛的实验评估中,我们证明了我们的解决方案的性能,并表明半监督阈值查询应用于基因表达数据是非常值得的。时间序列数据中的数据挖掘是研究药物及其对生命系统影响的关键步骤,包括化学定义的治疗方法和毒性药物的发现、设计、使用、作用模式和代谢。特别是时间序列数据的分析对药物基因组学具有重要的实际意义。经典的时间序列分析是基于预测或识别模式的技术(例如趋势分析或季节性分析)。时间序列之间的相似性,例如时间序列的相似运动,在分析中起着关键作用。在本文中,我们引入了一种新颖但非常重要的相似度查询类型,我们称之为阈值查询。给定一个时间序列数据库DB,查询时间序列Q,并查询阈值τ,阈值查询TSQ DB (Q,τ)返回这些时间序列X∈DB的时间间隔序列最相似的时间序列值高于τ,n——换句话说,我们假设每个时间序列X∈DB∪{Q}转换成一个不相交的时间间隔序列覆盖只有X(严格)的阈值τ。然后,阈值查询返回给定查询对象Q,该对象X∈DB具有最相似的时间间隔序列。让我们注意到,时间序列的确切值没有被考虑,相反,我们只对时间序列是否高于或低于给定阈值τ感兴趣。换句话说,阈值查询的概念使我们能够只关注由增加的时间序列振幅表示的某些事件的持续时间,而忽略相应振幅的程度。这个优势是非常有益的,特别是如果我们想比较时间
{"title":"Semi-Supervised Threshold Queries on Pharmacogenomics Time Sequences","authors":"J. Aßfalg, H. Kriegel, Peer Kröger, Peter Kunath, A. Pryakhin, M. Renz","doi":"10.1142/9781860947292_0034","DOIUrl":"https://doi.org/10.1142/9781860947292_0034","url":null,"abstract":"The analysis of time series data is of capital importance for pharmacogenomics since the experimental evaluations are usually based on observations of time dependent reactions or behaviors of organisms. Thus, data mining in time series databases is an important instrument towards understanding the effects of drugs on individuals. However, the complex nature of time series poses a big challenge for effective and efficient data mining. In this paper, we focus on the detection of temporal dependencies between different time series: we introduce the novel analysis concept of threshold queries and its semi-supervised extension which supports the parameter setting by applying training datasets. Basically, threshold queries report those time series exceeding an user-defined query threshold at certain time frames. For semi-supervised threshold queries the corresponding threshold is automatically adjusted to the characteristics of the data set, the training dataset, respectively. In order to support threshold queries efficiently, we present a new efficient access method which uses the fact that only partial information of the time series is required at query time. In an extensive experimental evaluation we demonstrate the performance of our solution and show that semi-supervised threshold queries applied to gene expression data are very worthwhile. Data mining in time series data is a key step within the study of drugs and their impact on living systems, including the discovery, design, usage, modes of action, and metabolism of chemically defined therapeutics and toxic agents. In particular, the analysis of time series data is of great practical importance for pharmacogenomics. Classical time series analysis is based on techniques for forecasting or for identifying patterns (e.g. trend analysis or seasonality). The similarity between time series, e.g. similar movements of time series, plays a key role for the analysis. In this paper, we introduce a novel but very important similarity query type which we call threshold query. Given a time series database DB, a query time series Q, and a query threshold τ , a threshold query TSQ DB(Q, τ ) returns those time series X ∈ DB having the most similar sequence of time intervals in which the time series values are above τ .I n other words, we assume that each time series X ∈ DB ∪{ Q} is transformed into a sequence of disjoint time intervals covering only those values of X that are (strictly) above the threshold τ . Then, a threshold query returns for a given query object Q that object X ∈ DB having the most similar sequence of time intervals. Let us note that the exact values of the time series are not considered, rather we are only interested in whether the time series is above or below a given threshold τ . In other words, the concept of threshold queries enables us to focus only on the duration of certain events indicated by increased time series amplitudes, while the degree of the corresponding amplitudes are i","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"81 1","pages":"307-316"},"PeriodicalIF":0.0,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88587852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
On the Complexity of Finding Control Strategies for Boolean Networks 布尔网络控制策略寻找的复杂性研究
Pub Date : 2005-12-01 DOI: 10.1142/9781860947292_0013
T. Akutsu, M. Hayashida, W. Ching, M. Ng
This paper considers a problem of finding control strategies for Boolean networks, where Boolean networks have been used as a model of genetic networks. This paper shows that finding a control strategy leading to the desired global state is NP-hard even if there is only one control node in the network. This result justifies existing exponential time algorithms for finding control strategies for probabilistic Boolean networks. On the other hand, this paper shows that the problem can be solved in polynomial time if the network has a tree structure.
本文研究了布尔网络的控制策略问题,其中布尔网络已被用作遗传网络的模型。本文表明,即使网络中只有一个控制节点,寻找导致理想全局状态的控制策略也是np困难的。这一结果证明了现有的指数时间算法用于寻找概率布尔网络的控制策略。另一方面,本文证明了如果网络具有树形结构,则问题可以在多项式时间内解决。
{"title":"On the Complexity of Finding Control Strategies for Boolean Networks","authors":"T. Akutsu, M. Hayashida, W. Ching, M. Ng","doi":"10.1142/9781860947292_0013","DOIUrl":"https://doi.org/10.1142/9781860947292_0013","url":null,"abstract":"This paper considers a problem of finding control strategies for Boolean networks, where Boolean networks have been used as a model of genetic networks. This paper shows that finding a control strategy leading to the desired global state is NP-hard even if there is only one control node in the network. This result justifies existing exponential time algorithms for finding control strategies for probabilistic Boolean networks. On the other hand, this paper shows that the problem can be solved in polynomial time if the network has a tree structure.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"30 1","pages":"99-108"},"PeriodicalIF":0.0,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91501188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Whole Genome Optical Mapping 全基因组光学图谱
Pub Date : 2005-12-01 DOI: 10.1142/9781860947292_0003
M. Waterman
An innovative new technology, optical mapping, is used to infer the genome map of the location of short sequence patterns called restriction sites. The technology, developed by David Schwartz, allows the visualization of the maps of randomly located single molecules around a million base pairs in length. The genome map is constructed from overlapping these shorter maps. The mathematical and computational challenges come from modeling the measurement errors and from the process of map assembly.
一种创新的新技术,光学制图,被用来推断基因组的位置的短序列模式称为限制位点。这项技术由大卫·施瓦茨(David Schwartz)开发,可以将长度约为100万个碱基对的随机定位的单个分子的图谱可视化。基因组图谱是由这些较短的图谱重叠而成。数学和计算方面的挑战来自于测量误差的建模和地图组装过程。
{"title":"Whole Genome Optical Mapping","authors":"M. Waterman","doi":"10.1142/9781860947292_0003","DOIUrl":"https://doi.org/10.1142/9781860947292_0003","url":null,"abstract":"An innovative new technology, optical mapping, is used to infer the genome map of the location of short sequence patterns called restriction sites. The technology, developed by David Schwartz, allows the visualization of the maps of randomly located single molecules around a million base pairs in length. The genome map is constructed from overlapping these shorter maps. The mathematical and computational challenges come from modeling the measurement errors and from the process of map assembly.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"61 1","pages":"5"},"PeriodicalIF":0.0,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81370929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the ... Asia-Pacific bioinformatics conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1