2007 IEEE International Conference on Bioinformatics and Biomedicine Workshops最新文献

英文中文

Multiresolution approaches to representation and visualization of large influenza virus sequence datasets 大型流感病毒序列数据集的多分辨率表示和可视化方法

2007 IEEE International Conference on Bioinformatics and Biomedicine Workshops

Pub Date : 2007-11-01 DOI: 10.1109/BIBMW.2007.4425408

L. Zaslavsky, Yīmíng Bào, T. Tatusova

Rapid growth of the amount of genome sequence data requires enhancing exploratory analysis tools, with analysis being performed in a fast and robust manner. Users need data representations serving different purposes: from seeing overall structure and data coverage to evolutionary processes during a particular season. Our approach to the problem is in constructing hierarchies of data representations, and providing users with representations adaptable to specific goals. It can be done efficiently because the structure of a typical influenza dataset is characterized by low estimated values of the Kolmogorov (box) dimension. Multi-scale methodologies allow interactive visual representation of the dataset and accelerate computations by importance sampling. Our tree visualization approach is based on a subtree aggregation with subscale resolution. It allows interactive refinements and coarsening of subtree views. For importance sampling large influenza datasets, we construct sets of well-scattered points (e-nets). While a tree build for a global sample provides a coarse-level representation of the whole dataset, it can be complemented by trees showing more details in chosen areas. To reflect both global dataset structure and local details correctly, we perform local refinement gradually, using a multiscale hierarchy of e-nets. Our hierarchical representations allow fast metadata searching.

基因组序列数据量的快速增长需要增强探索性分析工具，以快速和稳健的方式进行分析。用户需要满足不同目的的数据表示:从查看整体结构和数据覆盖到特定季节的演变过程。我们解决这个问题的方法是构建数据表示的层次结构，并为用户提供适合特定目标的表示。这可以有效地完成，因为典型流感数据集的结构特点是科尔莫戈罗夫(箱)维的估定值较低。多尺度方法允许数据集的交互式可视化表示，并通过重要性采样加速计算。我们的树可视化方法是基于具有亚尺度分辨率的子树聚合。它允许对子树视图进行交互细化和粗化。对于重要采样大型流感数据集，我们构建了分散良好的点集(e-nets)。虽然为全局样本构建的树提供了整个数据集的粗略表示，但它可以通过在选定区域显示更多细节的树来补充。为了正确反映全局数据集结构和局部细节，我们使用e-nets的多尺度层次结构逐步进行局部细化。我们的分层表示允许快速的元数据搜索。

{"title":"Multiresolution approaches to representation and visualization of large influenza virus sequence datasets","authors":"L. Zaslavsky, Yīmíng Bào, T. Tatusova","doi":"10.1109/BIBMW.2007.4425408","DOIUrl":"https://doi.org/10.1109/BIBMW.2007.4425408","url":null,"abstract":"Rapid growth of the amount of genome sequence data requires enhancing exploratory analysis tools, with analysis being performed in a fast and robust manner. Users need data representations serving different purposes: from seeing overall structure and data coverage to evolutionary processes during a particular season. Our approach to the problem is in constructing hierarchies of data representations, and providing users with representations adaptable to specific goals. It can be done efficiently because the structure of a typical influenza dataset is characterized by low estimated values of the Kolmogorov (box) dimension. Multi-scale methodologies allow interactive visual representation of the dataset and accelerate computations by importance sampling. Our tree visualization approach is based on a subtree aggregation with subscale resolution. It allows interactive refinements and coarsening of subtree views. For importance sampling large influenza datasets, we construct sets of well-scattered points (e-nets). While a tree build for a global sample provides a coarse-level representation of the whole dataset, it can be complemented by trees showing more details in chosen areas. To reflect both global dataset structure and local details correctly, we perform local refinement gradually, using a multiscale hierarchy of e-nets. Our hierarchical representations allow fast metadata searching.","PeriodicalId":260286,"journal":{"name":"2007 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130142300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Are filter methods very effective in gene selection of microarray data? 过滤方法在微阵列数据的基因选择中是否非常有效?

2007 IEEE International Conference on Bioinformatics and Biomedicine Workshops

Pub Date : 2007-11-01 DOI: 10.1109/BIBMW.2007.4425406

Zhou-Jun Li, Lijuan Zhang, Huo-Wang Chen

Feature (gene) selection is a frequently used preprocessing technology for successful cancer classification task in microarray gene expression data analysis. Widely used gene selection approaches are mainly focused on the filter methods. Filter methods are usually considered to be very effective and efficient for high-dimensional data. This paper reviews the existing filter methods, and shows the performance of the representative algorithms on microarray data by extensive experimental study. Surprisingly, the experimental results show that filter methods are not very effective on microarray data. We analyze the cause of the result and provide the basic ideas for potential solutions.

特征(基因)选择是微阵列基因表达数据分析中常用的一种成功的癌症分类预处理技术。目前广泛应用的基因选择方法主要集中在筛选方法上。过滤器方法通常被认为对高维数据非常有效和高效。本文综述了现有的滤波方法，并通过大量的实验研究，展示了代表性算法在微阵列数据上的性能。令人惊讶的是，实验结果表明，滤波方法对微阵列数据不是很有效。我们分析了造成这一结果的原因，并为潜在的解决方案提供了基本思路。

引用次数: 5

Point to face shortest paths in simple polytopes with applications in structural proteomics 在结构蛋白质组学中应用简单多面体的指向面最短路径

2007 IEEE International Conference on Bioinformatics and Biomedicine Workshops

Pub Date : 2007-11-01 DOI: 10.1109/BIBMW.2007.4425394

O. Daescu, Y. Cheung

We study the following problem. Given a simple polytope S in R3, with a total of n edges, and a query point s on S, find a shortest path from s to the boundary of the convex hull, CH(S), of S, that does not go through the interior of S. The problem appears in structural proteomics in the computation of shape descriptors for measuring the depth of a point on a surface. We present an algorithm with running time O(n3(lambda(n) log(n/epsiv)/epsiv4 + log(np) log(n log p))), that can find a path from s to the boundary of CH(S) that has length at most (1 + epsiv) times the length of a shortest path from s to the boundary of CH(S).

我们研究下面的问题。给定R3中的一个简单多面体S，总共有n条边，S上有一个查询点S，找到一条从S到S的凸壳边界CH(S)的最短路径，该路径不经过S的内部。在结构蛋白质组学中，计算用于测量表面上点的深度的形状描述符是一个问题。我们提出了一个运行时间为O(n3(lambda(n) log(n/epsiv)/epsiv4 + log(np) log(n log p)))的算法，该算法可以找到从s到CH(s)边界的路径，该路径的长度最多为s到CH(s)边界的最短路径长度的(1 + epsiv)倍。

引用次数: 0

Non-quantized minimum free energy in untranslated region exons 非翻译区外显子的非量子化最小自由能

2007 IEEE International Conference on Bioinformatics and Biomedicine Workshops

Pub Date : 1900-01-01 DOI: 10.1109/BIBMW.2007.4425397

K. Knapp, A. Rahaman, Y.-P.P. Chen

In an attempt to improve automated gene prediction in the untranslated region of a gene, we completed an in-depth analysis of the minimum free energy for 8,689 sub-genetic DNA sequences. We expanded Zhang's classification model and classified each sub-genetic sequence into one of 27 possible motifs. We calculated the minimum free energy for each motif to explore statistical features that correlate to biologically relevant sub-genetic sequences. If biologically relevant sub-genetic sequences fall into distinct free energy quanta it may be possible to characterize a motif based on its minimum free energy. Proper characterization of motifs can lead to greater understanding in automated genefinding, gene variability and the role DNA structure plays in gene network regulation. Our analysis determined: (1) the average free energy value for exons, introns and other biologically relevant sub-genetic sequences, (2) that these subsequences do not exist in distinct energy quanta, (3) that introns exist however in a tightly coupled average minimum free energy quantum compared to all other biologically relevant sub-genetic sequence types, (4) that single exon genes demonstrate a higher stability than exons which span the entire coding sequence as part of a multi-exon gene and (5) that all motif types contain a free energy global minimum at approximately nucleotide position 1,000 before reaching a plateau. These results should be relevant to the biochemist and bioinformatician seeking to understand the relationship between sub-genetic sequences and the information behind them.

为了提高基因非翻译区域的自动基因预测，我们完成了对8,689个亚遗传DNA序列的最小自由能的深入分析。我们扩展了Zhang的分类模型，并将每个亚基因序列分类为27个可能的基序之一。我们计算了每个基序的最小自由能，以探索与生物学相关的亚基因序列相关的统计特征。如果生物学上相关的亚基因序列落入不同的自由能量子，则有可能根据其最小自由能来表征基序。正确描述基序可以更好地理解自动基因发现、基因变异和DNA结构在基因网络调控中的作用。我们的分析确定:(1)外显子、内含子和其他与生物学相关的亚遗传序列的平均自由能值;(2)这些子序列不存在于不同的能量量子中;(3)与所有其他与生物学相关的亚遗传序列类型相比，内含子存在于紧密耦合的平均最小自由能量子中;(4)单外显子基因比跨整个编码序列作为多外显子基因的一部分的外显子表现出更高的稳定性;(5)所有基序类型在达到平台之前在大约核苷酸位置1000处包含自由能全局最小值。这些结果应该与生物化学家和生物信息学家寻求理解亚基因序列及其背后信息之间的关系有关。

{"title":"Non-quantized minimum free energy in untranslated region exons","authors":"K. Knapp, A. Rahaman, Y.-P.P. Chen","doi":"10.1109/BIBMW.2007.4425397","DOIUrl":"https://doi.org/10.1109/BIBMW.2007.4425397","url":null,"abstract":"In an attempt to improve automated gene prediction in the untranslated region of a gene, we completed an in-depth analysis of the minimum free energy for 8,689 sub-genetic DNA sequences. We expanded Zhang's classification model and classified each sub-genetic sequence into one of 27 possible motifs. We calculated the minimum free energy for each motif to explore statistical features that correlate to biologically relevant sub-genetic sequences. If biologically relevant sub-genetic sequences fall into distinct free energy quanta it may be possible to characterize a motif based on its minimum free energy. Proper characterization of motifs can lead to greater understanding in automated genefinding, gene variability and the role DNA structure plays in gene network regulation. Our analysis determined: (1) the average free energy value for exons, introns and other biologically relevant sub-genetic sequences, (2) that these subsequences do not exist in distinct energy quanta, (3) that introns exist however in a tightly coupled average minimum free energy quantum compared to all other biologically relevant sub-genetic sequence types, (4) that single exon genes demonstrate a higher stability than exons which span the entire coding sequence as part of a multi-exon gene and (5) that all motif types contain a free energy global minimum at approximately nucleotide position 1,000 before reaching a plateau. These results should be relevant to the biochemist and bioinformatician seeking to understand the relationship between sub-genetic sequences and the information behind them.","PeriodicalId":260286,"journal":{"name":"2007 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124283487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2007 IEEE International Conference on Bioinformatics and Biomedicine Workshops

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀