2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)最新文献

英文中文

Integrated analysis of the various types of microarray data using linear-mixed effects models 利用线性混合效应模型对各种类型的微阵列数据进行综合分析

2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706594

Sung-Gon Yi, T. Park

As the magnitude of the experiment increases, it is common to combine various types of microarrays such as paired and non-paired microarrays from different laboratories or hospitals. Thus, it is important to analyze microarray data together to derive a combined conclusion after accounting for heterogeneity among data sets. One of the main objectives of the microarray experiment is to identify differentially expressed genes among the different experimental groups. We propose the linear-mixed effect model for the integrated analysis of the heterogeneous microarray data sets. The proposed LMe model was illustrated using the data from 133 microarrays collected at three different hospitals. Though simulation studies, we compared the proposed LMe model approach with the meta-analysis and the ANOVA model approaches. The LMe model approach was shown to provide higher powers than the other approaches.

随着实验规模的增加，将不同类型的微阵列(如来自不同实验室或医院的配对和非配对微阵列)组合在一起是很常见的。因此，在考虑数据集之间的异质性后，对微阵列数据进行综合分析以得出综合结论是很重要的。微阵列实验的主要目的之一是鉴定不同实验组之间的差异表达基因。我们提出了线性混合效应模型，用于异构微阵列数据集的集成分析。所提出的LMe模型是使用从三家不同医院收集的133个微阵列数据来说明的。通过模拟研究，我们将提出的LMe模型方法与meta分析和ANOVA模型方法进行了比较。LMe模型方法被证明比其他方法提供更高的功率。

引用次数: 0

Unsupervised integration of multiple protein disorder predictors 多种蛋白质紊乱预测因子的无监督整合

2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706534

Ping Zhang, Z. Obradovic

Studies of intrinsically disordered proteins that lack a stable tertiary structure but still have important biological functions critically rely on computational methods that predict this property based on sequence information. Although a number of fairly successful models for prediction of protein disorder were developed over the last decade, the quality of their predictions is limited by available cases of confirmed disorders. To more reliably estimate protein disorder from protein sequences, an iterative algorithm is proposed that integrates predictions of multiple disorder models without relying on any protein sequences with confirmed disorder annotation. The iterative method alternately provides the maximum a posterior (MAP) estimation of disorder prediction and the maximum-likelihood (ML) estimation of quality of multiple disorder predictors. Experiments on data used at the Critical Assessment of Techniques for Protein Structure Prediction (CASP7 and CASP8) have shown the effectiveness of the proposed algorithm.

对缺乏稳定三级结构但仍具有重要生物学功能的内在无序蛋白质的研究严重依赖于基于序列信息预测这种特性的计算方法。尽管在过去十年中开发了许多相当成功的蛋白质紊乱预测模型，但其预测的质量受到现有确诊疾病病例的限制。为了从蛋白质序列中更可靠地估计蛋白质紊乱，提出了一种集成多个紊乱模型预测的迭代算法，而不依赖于任何已确认的紊乱注释的蛋白质序列。迭代方法交替地提供无序预测的最大后验(MAP)估计和多个无序预测器质量的最大似然(ML)估计。在蛋白质结构预测技术关键评估(CASP7和CASP8)中使用的数据实验表明了所提出算法的有效性。

引用次数: 3

Characterization of structural features for small regulatory RNAs in Escherichia coli genomes 大肠杆菌基因组中小调控rna的结构特征表征

2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706538

S. Le, B. Shapiro

Small regulatory RNAs are highly abundant noncoding RNAs (ncRNA) found in bacterial genomes. These small regulatory ncRNAs (sRNAs) can regulate the synthesis of proteins by mediating mRNA transcription, translation and stability. Furthermore, they also control the activity of specific proteins by binding to them. In this study, we present a general computational approach for identifying the distinct structure of sRNAs in the Escherichia coli (E. coli) genome by a quantitative measure, Ediff that is the energy difference between the optimal structure folded from a sequence segment and its corresponding optimal restrained structure where all base pairings formed in the original optimal structure are excluded. Our results indicate that most of the known small ncRNAs in E. coli K12 have very high normalized Ediff scores with high statistical significance. These sRNAs have distinct well-ordered structures that are both thermodynamically stable and uniquely folded.

小调控rna是在细菌基因组中发现的高度丰富的非编码rna (ncRNA)。这些小调控ncRNAs (sRNAs)可以通过介导mRNA的转录、翻译和稳定性来调控蛋白质的合成。此外，它们还通过与特定蛋白质的结合来控制它们的活性。在这项研究中，我们提出了一种通用的计算方法，通过定量测量Ediff来识别大肠杆菌(E. coli)基因组中sRNAs的不同结构，Ediff是从序列片段折叠的最佳结构与其相应的最佳限制结构之间的能量差，其中原始最佳结构中形成的所有碱基对都被排除。我们的研究结果表明，大肠杆菌K12中大多数已知的小ncrna具有非常高的标准化Ediff评分，具有很高的统计学意义。这些sRNAs具有独特的有序结构，既具有热力学稳定性，又具有独特的折叠。

引用次数: 0

Multi-objective evolutionary algorithms based Interpretable Fuzzy models for microarray gene expression data analysis 基于多目标进化算法的可解释模糊模型微阵列基因表达数据分析

2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706582

Zhenyu Wang, V. Palade

We believe the great interpretability of fuzzy models allow fuzzy-based methods to play a very important role in Microarray gene expression data analysis, but the advantages offered by fuzzy-based techniques in this application have not yet been fully explored in the literature. In this paper, we construct Multi-Objective Evolutionary Algorithms based Interpretable Fuzzy (MOEAIF) models for microarray gene expression data analysis. Our novel fuzzy models can significantly decrease the model complexity, and automatically balance the accuracy and interpretability of the models. The experimental studies have shown that relatively simple and small fuzzy rule bases, with satisfactory classification performance, have been successful found for challenging microarray gene expression datasets.

我们相信模糊模型的巨大可解释性使得基于模糊的方法在微阵列基因表达数据分析中发挥了非常重要的作用，但是基于模糊的技术在这一应用中所提供的优势尚未在文献中得到充分的探讨。在本文中，我们构建了基于多目标进化算法的可解释模糊(MOEAIF)模型用于微阵列基因表达数据分析。本文提出的模糊模型能显著降低模型的复杂度，并能自动平衡模型的准确性和可解释性。实验研究表明，已经成功地找到了相对简单和小的模糊规则库，并具有满意的分类性能。

引用次数: 2

Prediction of Protein-RNA interaction site using SVM-KNN algorithm with spatial information 基于空间信息的SVM-KNN算法预测蛋白质- rna相互作用位点

2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706539

Wei Chen, Shaowu Zhang, Yong-mei Cheng, Q. Pan

Protein-RNA interactions are vitally important to a number of fundamental cellular processes, including regulation of gene expression such as RNA splicing, transport and translation, protein synthesis and assembly of ribosome. More detailed information on the Protein-RNA interaction is helpful for comprehending the function notation and molecular regulatory mechanism, meanwhile, knowing the knowledge of Protein-RNA recognition can also help the biological scientist and researcher understand the site-directed mutagenesis and drug design. In the present work, we proposed a computational approach, based on SVM-KNN algorithm, with evolutionary information of spatial neighbour residues for prediction of protein-RNA interaction sites. The overall success rate obtained by 5-fold cross-validation is 78.00%, which is comparable or better than other existing methods, indicating our method is very promising for identifying and predicting protein-RNA interaction sites.

蛋白质-RNA相互作用对许多基本的细胞过程至关重要，包括基因表达的调控，如RNA剪接、转运和翻译、蛋白质合成和核糖体的组装。更详细地了解蛋白质- rna相互作用有助于理解其功能符号和分子调控机制，同时，了解蛋白质- rna识别的知识也有助于生物科学家和研究人员理解定点诱变和药物设计。在本工作中，我们提出了一种基于SVM-KNN算法的计算方法，利用空间邻近残基的进化信息来预测蛋白质- rna相互作用位点。5倍交叉验证的总体成功率为78.00%，与现有的其他方法相当或更好，表明我们的方法在蛋白质- rna相互作用位点的鉴定和预测方面非常有前景。

引用次数: 1

Prediction of human protein kinase substrate specificities 人蛋白激酶底物特异性的预测

2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706573

Javad Safaei, Ján Manuch, Arvind Gupta, L. Stacho, S. Pelech

In this paper we propose a new algorithm to predict the phosphorylation site specificities of 478 human protein kinases based on the primary structures of the catalytic domains of these enzymes. Existing methods deduce the specificity of a protein kinase through the alignment of the amino acid sequences of phospho-sites targeted by the kinase to generate a consensus sequence or they use machine learning models for recognition. However, for most protein kinases few if any substrates have been experimentally identified by protein sequencing and mass spectrometry. In this work, we used mutual information from a training set of over 200 protein kinases consensus phospho-site sequences and predicted amino acid interactions between kinases and their substrate phospho-sites to generate position-specific scoring matrices (PSSM). The results demonstrate that using our algorithm, knowledge of the primary amino acid sequence of the catalytic domain of these kinases is sufficient to predict their phosphorylation sites specificities and their PSSM matrices.

在本文中，我们提出了一种新的算法来预测478人蛋白激酶的磷酸化位点特异性基于这些酶的催化结构域的初级结构。现有方法通过对激酶靶向的磷酸位点的氨基酸序列进行比对来推断蛋白激酶的特异性，以产生共识序列，或者使用机器学习模型进行识别。然而，对于大多数蛋白激酶，很少有底物通过蛋白质测序和质谱法实验鉴定。在这项工作中，我们使用了来自200多个蛋白激酶共识磷酸化位点序列的互信息，并预测了激酶与其底物磷酸化位点之间的氨基酸相互作用，以生成位置特异性评分矩阵(PSSM)。结果表明，使用我们的算法，了解这些激酶催化结构域的一级氨基酸序列足以预测它们的磷酸化位点特异性和它们的PSSM矩阵。

引用次数: 4

Unsupervised discovery of fuzzy patterns in gene expression data 基因表达数据中模糊模式的无监督发现

2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706575

Gene P. K. Wu, Keith C. C. Chan, A. Wong, Bin Wu

Discovering patterns from gene expression levels is regarded as a classification problem when tissue classes of the samples are given and solved as a discrete-data problem by discretizing the expression levels of each gene into intervals maximizing the interdependence between that gene and the class labels. However, when class information is unavailable, discovering gene expression patterns becomes difficult. This paper attempts to tackle this important problem. For a gene pool with large number of genes, we first cluster the genes into smaller groups. In each group, we use the representative gene, one with highest interdependence with others in the group, to drive the discretization of the gene expression levels of other genes. Treating intervals as discrete events, association patterns can be discovered. If the gene groups obtained are crisp clusters, significant patterns overlapping different clusters cannot be found. This paper presents a new method of “fuzzifying” the crisp attribute clusters for that purpose. To evaluate the effectiveness of our approach, we first apply the above described procedure on a synthetic dataset and then a gene expression dataset with known class labels. The class labels are not being used in both analyses but used later as the ground truth in a classificatory problem for assessing the algorithm's effectiveness in fuzzy gene clustering and discretization. The results show the efficacy of the proposed method.

当给出样本的组织类别时，从基因表达水平发现模式被视为一个分类问题，并通过将每个基因的表达水平离散到最大限度地提高该基因与类别标签之间的相互依赖性的间隔来解决作为一个离散数据问题。然而，当班级信息不可用时，发现基因表达模式变得困难。本文试图解决这一重要问题。对于拥有大量基因的基因库，我们首先将基因聚类成较小的组。在每一组中，我们使用具有代表性的基因，即与组中其他基因相互依赖性最高的基因，来驱动其他基因的基因表达水平的离散化。将间隔视为离散事件，可以发现关联模式。如果获得的基因群是脆簇，则无法找到不同簇重叠的显著模式。为此，本文提出了一种对清晰属性簇进行“模糊化”的新方法。为了评估我们方法的有效性，我们首先将上述过程应用于合成数据集，然后应用于具有已知类标签的基因表达数据集。在两种分析中都没有使用类标签，但在分类问题中用作评估算法在模糊基因聚类和离散化中的有效性的基础真理。实验结果表明了该方法的有效性。

{"title":"Unsupervised discovery of fuzzy patterns in gene expression data","authors":"Gene P. K. Wu, Keith C. C. Chan, A. Wong, Bin Wu","doi":"10.1109/BIBM.2010.5706575","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706575","url":null,"abstract":"Discovering patterns from gene expression levels is regarded as a classification problem when tissue classes of the samples are given and solved as a discrete-data problem by discretizing the expression levels of each gene into intervals maximizing the interdependence between that gene and the class labels. However, when class information is unavailable, discovering gene expression patterns becomes difficult. This paper attempts to tackle this important problem. For a gene pool with large number of genes, we first cluster the genes into smaller groups. In each group, we use the representative gene, one with highest interdependence with others in the group, to drive the discretization of the gene expression levels of other genes. Treating intervals as discrete events, association patterns can be discovered. If the gene groups obtained are crisp clusters, significant patterns overlapping different clusters cannot be found. This paper presents a new method of “fuzzifying” the crisp attribute clusters for that purpose. To evaluate the effectiveness of our approach, we first apply the above described procedure on a synthetic dataset and then a gene expression dataset with known class labels. The class labels are not being used in both analyses but used later as the ground truth in a classificatory problem for assessing the algorithm's effectiveness in fuzzy gene clustering and discretization. The results show the efficacy of the proposed method.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124171579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Module-based biomarker discovery in breast cancer 基于模块的乳腺癌生物标志物发现

2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706590

Yuji Zhang, J. Xuan, R. Clarke, H. Ressom

The availability of genome-wide biological network data opens up new possibilities to discover novel biomarkers and elucidate cancer-related complex mechanisms at network level. In this paper, we propose a novel module-based feature selection framework, which integrates biological network information and gene expression data to identify biomarkers, not as individual genes but as functional modules. Also, a large-scale analysis of ensemble feature selection concept is presented. The method allows combining features selected from multiple runs with various data subsampling to increase the reliability and classification accuracy of the final set of selected features. The results from four breast cancer studies demonstrate that the identified module biomarkers achieve: i) higher classification accuracy in independent validation datasets; ii) better reproducibility than individual gene biomarkers; iii) improved biological interpretability; and iv) enhanced enrichment in cancer-related “disease drivers”.

全基因组生物网络数据的可用性为在网络水平上发现新的生物标志物和阐明癌症相关的复杂机制开辟了新的可能性。在本文中，我们提出了一种新的基于模块的特征选择框架，该框架将生物网络信息和基因表达数据相结合，以识别生物标志物，而不是作为单个基因，而是作为功能模块。同时，提出了一种大规模分析集成特征选择的概念。该方法允许将从多次运行中选择的特征与各种数据子采样相结合，以提高最终选择特征集的可靠性和分类精度。四项乳腺癌研究的结果表明，所鉴定的模块生物标志物在独立验证数据集中实现了更高的分类准确性;Ii)比单个基因生物标志物具有更好的再现性;Iii)提高生物可解释性;iv)与癌症相关的“疾病驱动因子”的富集增强。

引用次数: 4

Exploitation of 3D Stereotactic Surface Projection for automated classification of Alzheimer's disease according to dementia levels 利用三维立体定向表面投影技术根据痴呆水平自动分类阿尔茨海默病

2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706620

M. Ayhan, Ryan G. Benton, Vijay V. Raghavan, Suresh K. Choubey

Alzheimer's disease (AD) is one major cause of dementia. Previous studies have indicated that the use of features derived from Positron Emission Tomography (PET) scans lead to more accurate and earlier diagnosis of AD, compared to the traditional approach used for determining dementia ratings, which uses a combination of clinical assessments such as memory tests. In this study, we compare Naïve Bayes (NB), a probabilistic learner, with variations of Support Vector Machines (SVMs), a geometric learner, for the automatic diagnosis of Alzheimer's disease. 3D Stereotactic Surface Projection (3D-SSP) is utilized to extract features from PET scans. At the most detailed level, the dimensionality of the feature space is very high, resulting in 15964 features. Since classifier performance can degrade in the presence of a high number of features, we evaluate the benefits of a correlation-based feature selection method to find a small number of highly relevant features.

阿尔茨海默病(AD)是痴呆症的主要原因之一。先前的研究表明，与传统的确定痴呆等级的方法相比，使用正电子发射断层扫描(PET)扫描的特征可以更准确、更早地诊断AD，而传统的方法使用记忆测试等临床评估相结合。在这项研究中，我们比较了Naïve贝叶斯(NB)，一种概率学习器，与支持向量机(svm)，一种几何学习器的变体，用于阿尔茨海默病的自动诊断。利用三维立体定向表面投影(3D- ssp)从PET扫描中提取特征。在最详细的层面上，特征空间的维数非常高，产生15964个特征。由于存在大量特征时分类器性能会下降，因此我们评估了基于相关性的特征选择方法的好处，以找到少量高度相关的特征。

引用次数: 3

A new method for measuring the semantic similarity on gene ontology 一种测量基因本体语义相似度的新方法

2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2010-12-01 DOI: 10.1109/BIBM.2010.5706623

Ying Shen, Shaohong Zhang, H. Wong

Semantic similarity defined on Gene Ontology (GO) aims to provide the functional relationship between different biological processes, molecular functions, or cellular components. In this paper, a novel method, namely the Shortest Path (SP) algorithm, for measuring the semantic similarity on GO is proposed based on both the GO structure information and the term's property. The proposed algorithm searches for the shortest path that connects two terms and uses the sum of weights on the shortest path to compute the semantic similarity for GO terms. A method for evaluating the nonlinear correlation between two variables is also introduced for validation. Extensive experiments conducted on two public gene expression datasets demonstrate the overall superiority of SP method over the other state-of-the-art methods evaluated.

基因本体(Gene Ontology, GO)定义的语义相似性旨在提供不同生物过程、分子功能或细胞成分之间的功能关系。本文提出了一种基于GO结构信息和术语性质的GO语义相似度度量的新方法——最短路径(SP)算法。该算法通过搜索连接两项的最短路径，利用最短路径上的权值和计算GO项的语义相似度。本文还介绍了一种评估两个变量之间非线性相关性的方法，以进行验证。在两个公共基因表达数据集上进行的大量实验表明，SP方法比其他最先进的评估方法具有总体优势。

引用次数: 17

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀