Pub Date : 2007-01-01DOI: 10.1142/9781860947995_0023
Mahdi Belcaid, Anne Bergeron, A. Chateau, C. Chauve, Yannick Gingras, G. Poisson, M. Vendette
Genomes evolve with both mutations and large scale events, such as inversions, translocations, duplications and losses, that modify the structure of a set of chromosomes. In order to study these types of large-scale events, the first task is to select, in different genomes, sub-sequences that are considered “equivalent”. Many approaches have been used to identify equivalent sequences, either based on biological experiments, gene annotations, or sequence alignments. These techniques suffer from a variety of drawbacks that often result in the impossibility, for independent researchers, to reproduce the datasets used in the studies, or to adapt them to newly sequenced genomes. In this paper, we show that carefully selected small probes can be efficiently used to construct datasets. Once a set of probes is identified ‐ and published ‐, datasets for whole genome comparisons can be produced, and reproduced, with elementary algorithms; decisions about what is considered an occurrence of a probe in a genome can be criticized and reevaluated; and the structure of a newly sequenced genome can be obtained rapidly, without the need of gene annotations or intensive computations.
{"title":"Exploring Genome Rearrangements using Virtual Hybridization","authors":"Mahdi Belcaid, Anne Bergeron, A. Chateau, C. Chauve, Yannick Gingras, G. Poisson, M. Vendette","doi":"10.1142/9781860947995_0023","DOIUrl":"https://doi.org/10.1142/9781860947995_0023","url":null,"abstract":"Genomes evolve with both mutations and large scale events, such as inversions, translocations, duplications and losses, that modify the structure of a set of chromosomes. In order to study these types of large-scale events, the first task is to select, in different genomes, sub-sequences that are considered “equivalent”. Many approaches have been used to identify equivalent sequences, either based on biological experiments, gene annotations, or sequence alignments. These techniques suffer from a variety of drawbacks that often result in the impossibility, for independent researchers, to reproduce the datasets used in the studies, or to adapt them to newly sequenced genomes. In this paper, we show that carefully selected small probes can be efficiently used to construct datasets. Once a set of probes is identified ‐ and published ‐, datasets for whole genome comparisons can be produced, and reproduced, with elementary algorithms; decisions about what is considered an occurrence of a probe in a genome can be criticized and reevaluated; and the structure of a newly sequenced genome can be obtained rapidly, without the need of gene annotations or intensive computations.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"33 1","pages":"205-214"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76101912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-01-01DOI: 10.1142/9781860947995_0006
M. Comin, L. Parida
We address the problem of detecting consensus motifs, that occur with subtle variations, across multiple sequences. These are usually functional domains in DNA sequences such as transcriptional binding factors or other regulatory sites. The problem in its generality has been considered difficult and various benchmark data serve as the litmus test for different computational methods. We present a method centered around unsupervised combinatorial pattern discovery. The parameters are chosen using a careful statistical analysis of consensus motifs. This method works well on the benchmark data and is general enough to be extended to a scenario where the variation in the consensus motif includes indels (along with mutations). We also present some results on detection of transcription binding factors in human DNA sequences. Availability: The system will be made available at www.research.ibm.com/computationalgenomics.
{"title":"Subtle Motif Discovery for Detection of DNA Regulatory Sites","authors":"M. Comin, L. Parida","doi":"10.1142/9781860947995_0006","DOIUrl":"https://doi.org/10.1142/9781860947995_0006","url":null,"abstract":"We address the problem of detecting consensus motifs, that occur with subtle variations, across multiple sequences. These are usually functional domains in DNA sequences such as transcriptional binding factors or other regulatory sites. The problem in its generality has been considered difficult and various benchmark data serve as the litmus test for different computational methods. We present a method centered around unsupervised combinatorial pattern discovery. The parameters are chosen using a careful statistical analysis of consensus motifs. This method works well on the benchmark data and is general enough to be extended to a scenario where the variation in the consensus motif includes indels (along with mutations). We also present some results on detection of transcription binding factors in human DNA sequences. Availability: The system will be made available at www.research.ibm.com/computationalgenomics.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"44 1","pages":"27-36"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78268989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-01-01DOI: 10.1142/9781860947995_0004
D. Huson, Alexander F. Auch, Qi Ji, S. Schuster
In metagenomics, the goal is to analyze the genomic content of a sample of organisms collected from a common habitat. One approach is to apply large-scale random shotgun sequencing techniques to obtain a collection of DNA reads from the sample. This data is then compared against databases of known sequences such as NCBI-nr or NCBI-nt, in an attempt to identify the taxonomical content of the sample. We introduce a new software called MEGAN (Meta Genome ANalyzer) that generates species profiles from such sequencing data by assigning reads to taxa of the NCBI taxonomy using a straight-forward assignment algorithm. The approach is illustrated by application to a number of datasets obtained using both sequencing-by-synthesis and Sanger sequencing technology, including metagenomic data from a mammoth bone, a portion of the Sargasso sea data set, and several complete microbial test genomes used for validation proposes.
{"title":"etagenome Analysis using Megan","authors":"D. Huson, Alexander F. Auch, Qi Ji, S. Schuster","doi":"10.1142/9781860947995_0004","DOIUrl":"https://doi.org/10.1142/9781860947995_0004","url":null,"abstract":"In metagenomics, the goal is to analyze the genomic content of a sample of organisms collected from a common habitat. One approach is to apply large-scale random shotgun sequencing techniques to obtain a collection of DNA reads from the sample. This data is then compared against databases of known sequences such as NCBI-nr or NCBI-nt, in an attempt to identify the taxonomical content of the sample. We introduce a new software called MEGAN (Meta Genome ANalyzer) that generates species profiles from such sequencing data by assigning reads to taxa of the NCBI taxonomy using a straight-forward assignment algorithm. The approach is illustrated by application to a number of datasets obtained using both sequencing-by-synthesis and Sanger sequencing technology, including metagenomic data from a mammoth bone, a portion of the Sargasso sea data set, and several complete microbial test genomes used for validation proposes.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"73 1","pages":"7-16"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78078645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-01-01DOI: 10.1142/9781860947995_0025
W. Xu
In this paper, we study the exact probability distribution of the number of cycles c in the breakpoint graph of two random genomes with n genes or markers and 1 and 2 linear chromosomes, respectively. The genomic distance d between the two genomes is d = n c. In the limit we find that the expectation of d is n 2 1 2 2 1+2 2 1 1 2 ln
{"title":"The Distance Between Randomly Constructed Genomes","authors":"W. Xu","doi":"10.1142/9781860947995_0025","DOIUrl":"https://doi.org/10.1142/9781860947995_0025","url":null,"abstract":"In this paper, we study the exact probability distribution of the number of cycles c in the breakpoint graph of two random genomes with n genes or markers and 1 and 2 linear chromosomes, respectively. The genomic distance d between the two genomes is d = n c. In the limit we find that the expectation of d is n 2 1 2 2 1+2 2 1 1 2 ln","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"20 1","pages":"227-236"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85711737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-11-28DOI: 10.1142/9781860947995_0021
Y. Suh, Wentian Li
We study in detail a particular statistical method in genetic case-control analysis, labeled “genotypebased association”, in which the two test results from assuming dominant and recessive model are combined in one optimal output. This method differs both from the allele-based association which artificially doubles the sample size, and the direct χ test on 3-by-2 contingency table which may overestimate the degree of freedom. We conclude that the comparative advantage (or disadvantage) of the genotype-based test over the allele-based test mainly depends on two parameters, the allele frequency difference δ and the Hardy-Weinberg disequilibrium coefficient difference δǫ. Six different situations, called “phases”, characterized by the two X test statistics in allele-based and genotypebased test, are well separated in the phase diagram parameterized by δ and δǫ. For two major groups of phases, a single parameter θ = tan(δ/δǫ) is able to achieves an almost perfect phase separation. We also applied the analytic result to several types of disease models. It is shown that for dominant and additive models, genotype-based tests are favored over allele-based tests.
{"title":"Genotype-Based Case-Control Analysis, Violation of Hardy-Weinberg Equilibrium, and Phase Diagrams","authors":"Y. Suh, Wentian Li","doi":"10.1142/9781860947995_0021","DOIUrl":"https://doi.org/10.1142/9781860947995_0021","url":null,"abstract":"We study in detail a particular statistical method in genetic case-control analysis, labeled “genotypebased association”, in which the two test results from assuming dominant and recessive model are combined in one optimal output. This method differs both from the allele-based association which artificially doubles the sample size, and the direct χ test on 3-by-2 contingency table which may overestimate the degree of freedom. We conclude that the comparative advantage (or disadvantage) of the genotype-based test over the allele-based test mainly depends on two parameters, the allele frequency difference δ and the Hardy-Weinberg disequilibrium coefficient difference δǫ. Six different situations, called “phases”, characterized by the two X test statistics in allele-based and genotypebased test, are well separated in the phase diagram parameterized by δ and δǫ. For two major groups of phases, a single parameter θ = tan(δ/δǫ) is able to achieves an almost perfect phase separation. We also applied the analytic result to several types of disease models. It is shown that for dominant and additive models, genotype-based tests are favored over allele-based tests.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"35 1","pages":"185-194"},"PeriodicalIF":0.0,"publicationDate":"2006-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80622850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-12-01DOI: 10.1142/9781860947292_0009
Derek A. Ruths, L. Nakhleh
The central role phylogeny plays in biology and its pervasiveness in comparative genomics studies have led researchers to develop a plethora of methods for its accurate reconstruction. Most phylogeny reconstruction methods, though, assume a single tree underlying a given sequence alignment. While a good first approximation in many cases, a tree may not always model the evolutionary history of a set of organisms. When events such as interspecific recombi nation occur, different regions in the alignment may have different underlying trees. Accurate reconstruction of the evolutionary history of a set of sequences requires recombination detection, followed by separate analyses of the nonrecombining regions. Besides aiding accurate phylogenetic analyses, detecting recombination helps in understanding one of the main mechanisms of bacterial genome diversification. In this paper, we introduce RECOMP, an accurate and fast method for detecting recombination events in a sequence alignment. The method slides a fixed-width window across the alignment and determines the presence of recombination events based on a combination of topology and parsimony score differences in neighboring windows. On several synthetic and biological datasets, our method performs much faster than existing tools with accuracy comparable to the best available method.
{"title":"RECOMP: A Parsimony-Based Method for Detecting Recombination","authors":"Derek A. Ruths, L. Nakhleh","doi":"10.1142/9781860947292_0009","DOIUrl":"https://doi.org/10.1142/9781860947292_0009","url":null,"abstract":"The central role phylogeny plays in biology and its pervasiveness in comparative genomics studies have led researchers to develop a plethora of methods for its accurate reconstruction. Most phylogeny reconstruction methods, though, assume a single tree underlying a given sequence alignment. While a good first approximation in many cases, a tree may not always model the evolutionary history of a set of organisms. When events such as interspecific recombi nation occur, different regions in the alignment may have different underlying trees. Accurate reconstruction of the evolutionary history of a set of sequences requires recombination detection, followed by separate analyses of the nonrecombining regions. Besides aiding accurate phylogenetic analyses, detecting recombination helps in understanding one of the main mechanisms of bacterial genome diversification. In this paper, we introduce RECOMP, an accurate and fast method for detecting recombination events in a sequence alignment. The method slides a fixed-width window across the alignment and determines the presence of recombination events based on a combination of topology and parsimony score differences in neighboring windows. On several synthetic and biological datasets, our method performs much faster than existing tools with accuracy comparable to the best available method.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"24 1","pages":"59-68"},"PeriodicalIF":0.0,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74004680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-12-01DOI: 10.1142/9781860947292_0029
Ching-Tai Chen, Hsin-Nan Lin, K. Wu, Ting-Yi Sung, W. Hsu
Local structure prediction can facilitate ab initio structure prediction, protein threading, and remote homology detection. However, previous approaches to local structure prediction suffer from poor accuracy. In this paper, we propose a knowledge-based prediction method that assigns a measure called the local match rate to each position of an amino acid sequence to estimate the confidence of our approach. To remedy prediction results with low local match rates, we use a neural network prediction method. Then, we have a hybrid prediction method, HYPLOSP (HYbrid method to Protein LOcal Structure Prediction) that combines our knowledge-based method with a neural network method. We test the method on two different structural alphabets and evaluate it by QN, which is similar to Q3 in secondary structure prediction. The experimental results show that our method yields a significant improvement over previous studies.
局部结构预测有助于从头计算结构预测、蛋白质穿线和远程同源性检测。然而,以往的局部结构预测方法存在精度不高的问题。在本文中,我们提出了一种基于知识的预测方法,该方法为氨基酸序列的每个位置分配一个称为局部匹配率的度量来估计我们的方法的置信度。为了弥补局部匹配率低的预测结果,我们使用了神经网络预测方法。然后,我们提出了一种混合预测方法,HYPLOSP (hybrid method to Protein LOcal Structure prediction),它将基于知识的方法与神经网络方法相结合。我们在两种不同的结构字母上测试了该方法,并用QN对其进行了评价,QN与Q3在二级结构预测方面相似。实验结果表明,我们的方法比以往的研究有了明显的改进。
{"title":"A Knowledge-Based Approach to Protein Local Structure Prediction","authors":"Ching-Tai Chen, Hsin-Nan Lin, K. Wu, Ting-Yi Sung, W. Hsu","doi":"10.1142/9781860947292_0029","DOIUrl":"https://doi.org/10.1142/9781860947292_0029","url":null,"abstract":"Local structure prediction can facilitate ab initio structure prediction, protein threading, and remote homology detection. However, previous approaches to local structure prediction suffer from poor accuracy. In this paper, we propose a knowledge-based prediction method that assigns a measure called the local match rate to each position of an amino acid sequence to estimate the confidence of our approach. To remedy prediction results with low local match rates, we use a neural network prediction method. Then, we have a hybrid prediction method, HYPLOSP (HYbrid method to Protein LOcal Structure Prediction) that combines our knowledge-based method with a neural network method. We test the method on two different structural alphabets and evaluate it by QN, which is similar to Q3 in secondary structure prediction. The experimental results show that our method yields a significant improvement over previous studies.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"66 1","pages":"257-266"},"PeriodicalIF":0.0,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87993225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-12-01DOI: 10.1142/9781860947292_0034
J. Aßfalg, H. Kriegel, Peer Kröger, Peter Kunath, A. Pryakhin, M. Renz
The analysis of time series data is of capital importance for pharmacogenomics since the experimental evaluations are usually based on observations of time dependent reactions or behaviors of organisms. Thus, data mining in time series databases is an important instrument towards understanding the effects of drugs on individuals. However, the complex nature of time series poses a big challenge for effective and efficient data mining. In this paper, we focus on the detection of temporal dependencies between different time series: we introduce the novel analysis concept of threshold queries and its semi-supervised extension which supports the parameter setting by applying training datasets. Basically, threshold queries report those time series exceeding an user-defined query threshold at certain time frames. For semi-supervised threshold queries the corresponding threshold is automatically adjusted to the characteristics of the data set, the training dataset, respectively. In order to support threshold queries efficiently, we present a new efficient access method which uses the fact that only partial information of the time series is required at query time. In an extensive experimental evaluation we demonstrate the performance of our solution and show that semi-supervised threshold queries applied to gene expression data are very worthwhile. Data mining in time series data is a key step within the study of drugs and their impact on living systems, including the discovery, design, usage, modes of action, and metabolism of chemically defined therapeutics and toxic agents. In particular, the analysis of time series data is of great practical importance for pharmacogenomics. Classical time series analysis is based on techniques for forecasting or for identifying patterns (e.g. trend analysis or seasonality). The similarity between time series, e.g. similar movements of time series, plays a key role for the analysis. In this paper, we introduce a novel but very important similarity query type which we call threshold query. Given a time series database DB, a query time series Q, and a query threshold τ , a threshold query TSQ DB(Q, τ ) returns those time series X ∈ DB having the most similar sequence of time intervals in which the time series values are above τ .I n other words, we assume that each time series X ∈ DB ∪{ Q} is transformed into a sequence of disjoint time intervals covering only those values of X that are (strictly) above the threshold τ . Then, a threshold query returns for a given query object Q that object X ∈ DB having the most similar sequence of time intervals. Let us note that the exact values of the time series are not considered, rather we are only interested in whether the time series is above or below a given threshold τ . In other words, the concept of threshold queries enables us to focus only on the duration of certain events indicated by increased time series amplitudes, while the degree of the corresponding amplitudes are i
时间序列数据的分析对于药物基因组学来说是至关重要的,因为实验评估通常是基于对生物体的时间依赖性反应或行为的观察。因此,时间序列数据库中的数据挖掘是了解药物对个体影响的重要工具。然而,时间序列的复杂性对有效和高效的数据挖掘提出了巨大的挑战。本文主要研究了不同时间序列间时间相关性的检测问题,引入了新的阈值查询分析概念及其半监督扩展,该概念支持通过训练数据集进行参数设置。基本上,阈值查询会报告在特定时间范围内超过用户定义查询阈值的时间序列。对于半监督阈值查询,相应的阈值分别根据数据集、训练数据集的特征自动调整。为了有效地支持阈值查询,我们利用查询时只需要部分时间序列信息的特点,提出了一种新的高效访问方法。在广泛的实验评估中,我们证明了我们的解决方案的性能,并表明半监督阈值查询应用于基因表达数据是非常值得的。时间序列数据中的数据挖掘是研究药物及其对生命系统影响的关键步骤,包括化学定义的治疗方法和毒性药物的发现、设计、使用、作用模式和代谢。特别是时间序列数据的分析对药物基因组学具有重要的实际意义。经典的时间序列分析是基于预测或识别模式的技术(例如趋势分析或季节性分析)。时间序列之间的相似性,例如时间序列的相似运动,在分析中起着关键作用。在本文中,我们引入了一种新颖但非常重要的相似度查询类型,我们称之为阈值查询。给定一个时间序列数据库DB,查询时间序列Q,并查询阈值τ,阈值查询TSQ DB (Q,τ)返回这些时间序列X∈DB的时间间隔序列最相似的时间序列值高于τ,n——换句话说,我们假设每个时间序列X∈DB∪{Q}转换成一个不相交的时间间隔序列覆盖只有X(严格)的阈值τ。然后,阈值查询返回给定查询对象Q,该对象X∈DB具有最相似的时间间隔序列。让我们注意到,时间序列的确切值没有被考虑,相反,我们只对时间序列是否高于或低于给定阈值τ感兴趣。换句话说,阈值查询的概念使我们能够只关注由增加的时间序列振幅表示的某些事件的持续时间,而忽略相应振幅的程度。这个优势是非常有益的,特别是如果我们想比较时间
{"title":"Semi-Supervised Threshold Queries on Pharmacogenomics Time Sequences","authors":"J. Aßfalg, H. Kriegel, Peer Kröger, Peter Kunath, A. Pryakhin, M. Renz","doi":"10.1142/9781860947292_0034","DOIUrl":"https://doi.org/10.1142/9781860947292_0034","url":null,"abstract":"The analysis of time series data is of capital importance for pharmacogenomics since the experimental evaluations are usually based on observations of time dependent reactions or behaviors of organisms. Thus, data mining in time series databases is an important instrument towards understanding the effects of drugs on individuals. However, the complex nature of time series poses a big challenge for effective and efficient data mining. In this paper, we focus on the detection of temporal dependencies between different time series: we introduce the novel analysis concept of threshold queries and its semi-supervised extension which supports the parameter setting by applying training datasets. Basically, threshold queries report those time series exceeding an user-defined query threshold at certain time frames. For semi-supervised threshold queries the corresponding threshold is automatically adjusted to the characteristics of the data set, the training dataset, respectively. In order to support threshold queries efficiently, we present a new efficient access method which uses the fact that only partial information of the time series is required at query time. In an extensive experimental evaluation we demonstrate the performance of our solution and show that semi-supervised threshold queries applied to gene expression data are very worthwhile. Data mining in time series data is a key step within the study of drugs and their impact on living systems, including the discovery, design, usage, modes of action, and metabolism of chemically defined therapeutics and toxic agents. In particular, the analysis of time series data is of great practical importance for pharmacogenomics. Classical time series analysis is based on techniques for forecasting or for identifying patterns (e.g. trend analysis or seasonality). The similarity between time series, e.g. similar movements of time series, plays a key role for the analysis. In this paper, we introduce a novel but very important similarity query type which we call threshold query. Given a time series database DB, a query time series Q, and a query threshold τ , a threshold query TSQ DB(Q, τ ) returns those time series X ∈ DB having the most similar sequence of time intervals in which the time series values are above τ .I n other words, we assume that each time series X ∈ DB ∪{ Q} is transformed into a sequence of disjoint time intervals covering only those values of X that are (strictly) above the threshold τ . Then, a threshold query returns for a given query object Q that object X ∈ DB having the most similar sequence of time intervals. Let us note that the exact values of the time series are not considered, rather we are only interested in whether the time series is above or below a given threshold τ . In other words, the concept of threshold queries enables us to focus only on the duration of certain events indicated by increased time series amplitudes, while the degree of the corresponding amplitudes are i","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"81 1","pages":"307-316"},"PeriodicalIF":0.0,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88587852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-12-01DOI: 10.1142/9781860947292_0013
T. Akutsu, M. Hayashida, W. Ching, M. Ng
This paper considers a problem of finding control strategies for Boolean networks, where Boolean networks have been used as a model of genetic networks. This paper shows that finding a control strategy leading to the desired global state is NP-hard even if there is only one control node in the network. This result justifies existing exponential time algorithms for finding control strategies for probabilistic Boolean networks. On the other hand, this paper shows that the problem can be solved in polynomial time if the network has a tree structure.
{"title":"On the Complexity of Finding Control Strategies for Boolean Networks","authors":"T. Akutsu, M. Hayashida, W. Ching, M. Ng","doi":"10.1142/9781860947292_0013","DOIUrl":"https://doi.org/10.1142/9781860947292_0013","url":null,"abstract":"This paper considers a problem of finding control strategies for Boolean networks, where Boolean networks have been used as a model of genetic networks. This paper shows that finding a control strategy leading to the desired global state is NP-hard even if there is only one control node in the network. This result justifies existing exponential time algorithms for finding control strategies for probabilistic Boolean networks. On the other hand, this paper shows that the problem can be solved in polynomial time if the network has a tree structure.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"30 1","pages":"99-108"},"PeriodicalIF":0.0,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91501188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-12-01DOI: 10.1142/9781860947292_0003
M. Waterman
An innovative new technology, optical mapping, is used to infer the genome map of the location of short sequence patterns called restriction sites. The technology, developed by David Schwartz, allows the visualization of the maps of randomly located single molecules around a million base pairs in length. The genome map is constructed from overlapping these shorter maps. The mathematical and computational challenges come from modeling the measurement errors and from the process of map assembly.
{"title":"Whole Genome Optical Mapping","authors":"M. Waterman","doi":"10.1142/9781860947292_0003","DOIUrl":"https://doi.org/10.1142/9781860947292_0003","url":null,"abstract":"An innovative new technology, optical mapping, is used to infer the genome map of the location of short sequence patterns called restriction sites. The technology, developed by David Schwartz, allows the visualization of the maps of randomly located single molecules around a million base pairs in length. The genome map is constructed from overlapping these shorter maps. The mathematical and computational challenges come from modeling the measurement errors and from the process of map assembly.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"61 1","pages":"5"},"PeriodicalIF":0.0,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81370929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}