Pub Date : 2007-11-01DOI: 10.1109/BIBMW.2007.4425408
L. Zaslavsky, Yīmíng Bào, T. Tatusova
Rapid growth of the amount of genome sequence data requires enhancing exploratory analysis tools, with analysis being performed in a fast and robust manner. Users need data representations serving different purposes: from seeing overall structure and data coverage to evolutionary processes during a particular season. Our approach to the problem is in constructing hierarchies of data representations, and providing users with representations adaptable to specific goals. It can be done efficiently because the structure of a typical influenza dataset is characterized by low estimated values of the Kolmogorov (box) dimension. Multi-scale methodologies allow interactive visual representation of the dataset and accelerate computations by importance sampling. Our tree visualization approach is based on a subtree aggregation with subscale resolution. It allows interactive refinements and coarsening of subtree views. For importance sampling large influenza datasets, we construct sets of well-scattered points (e-nets). While a tree build for a global sample provides a coarse-level representation of the whole dataset, it can be complemented by trees showing more details in chosen areas. To reflect both global dataset structure and local details correctly, we perform local refinement gradually, using a multiscale hierarchy of e-nets. Our hierarchical representations allow fast metadata searching.
{"title":"Multiresolution approaches to representation and visualization of large influenza virus sequence datasets","authors":"L. Zaslavsky, Yīmíng Bào, T. Tatusova","doi":"10.1109/BIBMW.2007.4425408","DOIUrl":"https://doi.org/10.1109/BIBMW.2007.4425408","url":null,"abstract":"Rapid growth of the amount of genome sequence data requires enhancing exploratory analysis tools, with analysis being performed in a fast and robust manner. Users need data representations serving different purposes: from seeing overall structure and data coverage to evolutionary processes during a particular season. Our approach to the problem is in constructing hierarchies of data representations, and providing users with representations adaptable to specific goals. It can be done efficiently because the structure of a typical influenza dataset is characterized by low estimated values of the Kolmogorov (box) dimension. Multi-scale methodologies allow interactive visual representation of the dataset and accelerate computations by importance sampling. Our tree visualization approach is based on a subtree aggregation with subscale resolution. It allows interactive refinements and coarsening of subtree views. For importance sampling large influenza datasets, we construct sets of well-scattered points (e-nets). While a tree build for a global sample provides a coarse-level representation of the whole dataset, it can be complemented by trees showing more details in chosen areas. To reflect both global dataset structure and local details correctly, we perform local refinement gradually, using a multiscale hierarchy of e-nets. Our hierarchical representations allow fast metadata searching.","PeriodicalId":260286,"journal":{"name":"2007 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130142300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-11-01DOI: 10.1109/BIBMW.2007.4425406
Zhou-Jun Li, Lijuan Zhang, Huo-Wang Chen
Feature (gene) selection is a frequently used preprocessing technology for successful cancer classification task in microarray gene expression data analysis. Widely used gene selection approaches are mainly focused on the filter methods. Filter methods are usually considered to be very effective and efficient for high-dimensional data. This paper reviews the existing filter methods, and shows the performance of the representative algorithms on microarray data by extensive experimental study. Surprisingly, the experimental results show that filter methods are not very effective on microarray data. We analyze the cause of the result and provide the basic ideas for potential solutions.
{"title":"Are filter methods very effective in gene selection of microarray data?","authors":"Zhou-Jun Li, Lijuan Zhang, Huo-Wang Chen","doi":"10.1109/BIBMW.2007.4425406","DOIUrl":"https://doi.org/10.1109/BIBMW.2007.4425406","url":null,"abstract":"Feature (gene) selection is a frequently used preprocessing technology for successful cancer classification task in microarray gene expression data analysis. Widely used gene selection approaches are mainly focused on the filter methods. Filter methods are usually considered to be very effective and efficient for high-dimensional data. This paper reviews the existing filter methods, and shows the performance of the representative algorithms on microarray data by extensive experimental study. Surprisingly, the experimental results show that filter methods are not very effective on microarray data. We analyze the cause of the result and provide the basic ideas for potential solutions.","PeriodicalId":260286,"journal":{"name":"2007 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128816406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-11-01DOI: 10.1109/BIBMW.2007.4425394
O. Daescu, Y. Cheung
We study the following problem. Given a simple polytope S in R3, with a total of n edges, and a query point s on S, find a shortest path from s to the boundary of the convex hull, CH(S), of S, that does not go through the interior of S. The problem appears in structural proteomics in the computation of shape descriptors for measuring the depth of a point on a surface. We present an algorithm with running time O(n3(lambda(n) log(n/epsiv)/epsiv4 + log(np) log(n log p))), that can find a path from s to the boundary of CH(S) that has length at most (1 + epsiv) times the length of a shortest path from s to the boundary of CH(S).
{"title":"Point to face shortest paths in simple polytopes with applications in structural proteomics","authors":"O. Daescu, Y. Cheung","doi":"10.1109/BIBMW.2007.4425394","DOIUrl":"https://doi.org/10.1109/BIBMW.2007.4425394","url":null,"abstract":"We study the following problem. Given a simple polytope S in R3, with a total of n edges, and a query point s on S, find a shortest path from s to the boundary of the convex hull, CH(S), of S, that does not go through the interior of S. The problem appears in structural proteomics in the computation of shape descriptors for measuring the depth of a point on a surface. We present an algorithm with running time O(n3(lambda(n) log(n/epsiv)/epsiv4 + log(np) log(n log p))), that can find a path from s to the boundary of CH(S) that has length at most (1 + epsiv) times the length of a shortest path from s to the boundary of CH(S).","PeriodicalId":260286,"journal":{"name":"2007 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":"13 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124277149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1109/BIBMW.2007.4425397
K. Knapp, A. Rahaman, Y.-P.P. Chen
In an attempt to improve automated gene prediction in the untranslated region of a gene, we completed an in-depth analysis of the minimum free energy for 8,689 sub-genetic DNA sequences. We expanded Zhang's classification model and classified each sub-genetic sequence into one of 27 possible motifs. We calculated the minimum free energy for each motif to explore statistical features that correlate to biologically relevant sub-genetic sequences. If biologically relevant sub-genetic sequences fall into distinct free energy quanta it may be possible to characterize a motif based on its minimum free energy. Proper characterization of motifs can lead to greater understanding in automated genefinding, gene variability and the role DNA structure plays in gene network regulation. Our analysis determined: (1) the average free energy value for exons, introns and other biologically relevant sub-genetic sequences, (2) that these subsequences do not exist in distinct energy quanta, (3) that introns exist however in a tightly coupled average minimum free energy quantum compared to all other biologically relevant sub-genetic sequence types, (4) that single exon genes demonstrate a higher stability than exons which span the entire coding sequence as part of a multi-exon gene and (5) that all motif types contain a free energy global minimum at approximately nucleotide position 1,000 before reaching a plateau. These results should be relevant to the biochemist and bioinformatician seeking to understand the relationship between sub-genetic sequences and the information behind them.
{"title":"Non-quantized minimum free energy in untranslated region exons","authors":"K. Knapp, A. Rahaman, Y.-P.P. Chen","doi":"10.1109/BIBMW.2007.4425397","DOIUrl":"https://doi.org/10.1109/BIBMW.2007.4425397","url":null,"abstract":"In an attempt to improve automated gene prediction in the untranslated region of a gene, we completed an in-depth analysis of the minimum free energy for 8,689 sub-genetic DNA sequences. We expanded Zhang's classification model and classified each sub-genetic sequence into one of 27 possible motifs. We calculated the minimum free energy for each motif to explore statistical features that correlate to biologically relevant sub-genetic sequences. If biologically relevant sub-genetic sequences fall into distinct free energy quanta it may be possible to characterize a motif based on its minimum free energy. Proper characterization of motifs can lead to greater understanding in automated genefinding, gene variability and the role DNA structure plays in gene network regulation. Our analysis determined: (1) the average free energy value for exons, introns and other biologically relevant sub-genetic sequences, (2) that these subsequences do not exist in distinct energy quanta, (3) that introns exist however in a tightly coupled average minimum free energy quantum compared to all other biologically relevant sub-genetic sequence types, (4) that single exon genes demonstrate a higher stability than exons which span the entire coding sequence as part of a multi-exon gene and (5) that all motif types contain a free energy global minimum at approximately nucleotide position 1,000 before reaching a plateau. These results should be relevant to the biochemist and bioinformatician seeking to understand the relationship between sub-genetic sequences and the information behind them.","PeriodicalId":260286,"journal":{"name":"2007 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124283487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}